ELEMENTARY STATISTICS 
AND APPLICATIONS 






Elementary Statistics 
and Applications 


FUNDAMENTALS OF THE 
THEORY OF STATISTICS 


BV 


JAMES G. SMITH 


Department of Economics and Social Institutions 
Princeton Unmet sity 


AND 

ACHESON J. DUNCAN 

Department of Political Economy 
Johns Hopkins Uniicrsiiy 



McGRAW-HILL BOOK COMPANY, Inc. 

NEW YORK AND LONDON 
1944 



ELEMENT^RT ST\TISTIC8 \NO APPUCATIO\S 

COPTBICHT, 1044 BY THE 
McGraw-Hill Hckil Company, Jsc 

PRINTED IN THE UNITED 8TATFS OF AMERICA 


AU FFfFnwf Thu b^k, or 
■parli Ihtrtof, may nol 6« rcproducei’ 
t« <iny form uxthoul permttnon o. 
the pubtuhrrj 



PREFACE 


Elementary Statistics and Applications is designed for a begin- 
ning course. Principles of gathering and presenting statistics, 
frequency-distribution analysis, probability theory and the 
normal cui’ve, correlation, time-series analj’-sis, and forecasting 
are included. Elementary sampling procedure, only so far as 
it is founded upon the assumption of normal sampling dis- 
tributions, is also included. 

No attempt has been made to include any of the less con- 
ventional methods of time-series analysis. Some are too mathe- 
matical for treatment in an elementarj'^ text. Others are so 
highly specialized or so subjective as to be unsuited for textbook 
material. Many of these are new methods that need to be 
further systematized, coordinated, and tested in the crucible of 
time and experience. 

The approach in this book is that of the teacher. The authors 
have been associated in teaching statistics for more than ten 
years. The manuscript of the present text evolved during those 
years in mimeographed fonn, modified from year to year as new 
theories developed and as teaching use required. The sug- 
gestions of students, whether consciousl3'’ or unconscious^ made, 
have helped fonnulate this book. Experience has shown that 
students gain a sense of the close association of statistics to 
reality from the brief discussions of the historical origin of impor- 
tant steps in the development of statistical theory that are 
included. 

The descriptions of frequency-distribution, correlation, and 
time-series analysis are first completed in their simplest aspects, 
with elementary illustrations. This enables the student to 
visualize basic method unmixed vith the more advanced phases. 
More complex illustrations of practicabapplication are then given 
in separate copters or separate sections. This practice elimi- 
nates the apmrent digression that seems tOvhamper-ffie'stuHfept 
when exfip^lt^ion of method and compUcafql illustrations are 



inUACL 


M 

mtcrmixccl as m the con\entional te\t In separating the t^\o 
morcov er the fact is recognized that the best order of presenta 
tion for teaching is not the best order of procedure for iiorkmg 
an actual problem For example, the handiest method for 
making a frequency dtstnbution analysis is to set up a ^\ork sheet 
w ith a Charber check and first calculate the moments or the A. 
‘.tatistics, but the theories of the moments and of the k statistics 
ire among the most difficult parts of the analysis to explain and 
are not therefore good introductory topics for the teaching of 
frequencj-rhstnbiition analysis In addition the practical 
analj sis of the frequency distribution introduces short cuts cross 
checks or other timesaving de\ ices The authors belie\ e that 
this ne\\ arrangement uiU also pro\c to be a boon to research 
workers who maj use the text as a reference book 
The more advanced points of statistical theory pertaining to 
frequency curves and sampling analysis have been placed m a 
separate book entitled SampkngSlaUsltcsand Apphcalions Tho 
two books together constitute a «et on the subject of Funda 
mentals of the Theory of Statistics 
In both volumes the authors have drawn freely upon tho 
manv monograplts and the periodical literature that have 
appeared during recent >ears Care has been exercised to 
make acknowledgment m footnotes to the sources of new ideas 
that have been incorporated into the authors’ own development 
of the subject To all these vigorous workers m the field too 
numerous to be listed by name the authors as well as other 
statisticians are greatly mdebted 

More particularlj the authors here acknowledge a debt of 
gratitude to scvenl generous professional colleagues who have 
road parts of the manuscript with critical and judicious eye 
Sidncj W ^\ilcox Chief Statistician of the Bureau of Labor 
‘^^tntisfics in the United States Department of Labor made 
t-'pecialK helpful suggestions for Chap \IX Index lumbers 
for the chapters on probabiht> thcoiy and for Parts I and II 
of Elementary Statittics nnif Applications John H Smith of 
the Bureau of I abor Statistics conlnbuted m inj stimulating 
critici-ms and suggestions that the authors believe inspired 
important improvements Ltstcr S Ivellogg Burtaii of I abor 
St iti-fus lead Chap III Soiuxts of Statistics and made sug 
gcstions that led to a constructive reworking of that material 



PREFACE 


vU 

The authors are profoundly grateful for such generous assistance 
and wish to make full acknowledgment of their i)rofessional 
indebtedness to these men. 

The authors are grateful to the International Finance Section 
of Princeton University for the financial assistance given Acheson 
J. Duncan some years ago to enable him to studj'- statistics and 
mathematical economics with the late Heniy Schultz of the 
University of Chicago and ndth Harold Hotelling of Columbia 
University. The author’s are indebted to those men, and to 
colleagues in the Mathematics Depar’tment at Princeton Uni- 
versitj''. The authors are also indebted to Professor R. A. 
Fisher, also to Messrs. Oliver & Boj’’d, Ltd., of Edinburgh, for 
permission to reprint an abridged edition of Table III, Table 
of x^j from their book Statistical Methods for Research Workers. 

Naturally, it is not to be supposed that the whole or an}"- part 
of the manuscript carries the endorsement of the authors’ former 
teachers or those who have helped with criticisms of the manu- 
script. The authors assume full responsibilit}^ for errors of 
theory or calculation that may be present in the volumes. * 

James G. Smith. 
Acheson J. Duncan. 

PnIKCETON, N. .1., 

August , 1944. 



CONTENTS 


Preface 


Page 

V 


PART I 

INTRODUCTION 


Chapter 

I. Statistics in the Ana's and Sciences 1 

II. Gatheking Statistics 24 

III. Sources of Statistics ' 56 

IV. Presentation of Statistics 92 

Statistics — A Study of Variation 122 


PART II 

ANALYSIS OF FREQUENCY DISTRIBUTIONS 


* VI. Summarization and Comparison 158 

VII. Illustration of Frequency-distribution Analysis .... 199 


PART III 

THE NORMAL FREQUENCY CURVE 


VIII. Prob.ability 232 

IX'. Probability Distributions 252 

X. Probability Calculus 268 

XI. Sy'mmetkical Binomial Distribution and the Normal 

Curve 279 

• XII. Use of the Normal Frequency Curve in Sasipling Analy- 
sis 307 


PART IV 

STUDY OF BIVARIATES AND MUETIVARIATES 


XIII. Simple Correlation 321 

XIV. Computation of r and Other Measures or CoKREL.iTioN . 354 

XV. Nonlinear Correlation 365 

XVI. Multiple and Partial Correlation 397 

XVII. Analysis of a IMultivariate Frequency Distribution 

Illustrated 437 

XVIII. Normal Frequency Surface 469 



CONTENTS 


PART V 

STXJDY OF DYNAMIC VARIAmLIT\ 

Cbaptek P^or 

• XIX Index Numbers 497 

XX JUnoNAL Basis ut iiif Anammh of Time Seuiks 513 

\XI Trend Analysis 504 

X\n Ortiioconal- Polynomial J rends 599 

,\Xin TI^fE-sEH^ES Analysis — Sbasonai \ariation 617 

WIV' Determination op Cycle 037 

PARI M 

roRi-x astiag 

XX\ The Art 01 FoRKCAbTisc WITH Statistics . 651 

APPENDIX 

Table I Four^flace Common Logarithms oi Numbers 6S1 

Table II Squares or Numbers 685 

Table III Square Hoots or Numbers prom 10 to 100 6S7 

1 ABLE IV Square Roots op Numbers from 100 to 1000 689 

Table V Reciprocals op Numbers 691 

Table VI Areas under the Normal Curvp 693 

'IablpVII Ordinates oi the Normal Cuba e 694 

1 ABLE Mil IIyPERBOUC PuNCTlOhS 695 

Author Index 697 


Subject Index 



PART I 
Introduction 

CHAPTER I 

STATISTICS IN THE ARTS AND SCIENCES 

Prom the sixteenth century to the ]>resent day, modern sciences 
have stressed empirical inethod — the gathering of data by labora- 
tory experiment or by statistical observation. Laboratory 
experimentation has been more spectacularly emplo5’’ed in the 
natural sciences (biology, chemistiy, physics, botany, and the 
like), and. statistical observation has been more widel}" used in 
the social sciences (such as politics, economics, and psychology). 
Yet laboratory technique is used for some types of investigation 
in the social sciences, especially in psj'^iology, education, and 
agriculture; and statistical technique is frequently employed 
in' the natural sciences; for example, the modern kinetic theory 
of gases is a statistical argument. 

Economy and Flexibility of Statistics. Meaning of Sfalistics., 
Statistics and scientific method are of value wherever a mass of 
complicated facts exists and wherever those facts are amenable to 
quantitative expression. Qualitative knowledge must be con- 
verted into quantitative units of enumeration or of measurement 
before it becomes statistics. The quantitative units are either 
enumerative or measureme nt unit s. An omimerath-e unit 
depmrHs iipori proper~c(e&nition of the objects to be counted; 
thus statistics ma}’- be compiled on the number of blue-e3'ed 
as compared with brown-ej'cd people, the number of j'-ellow a.s 
compared with green beans, etc. A measurement imit depends 
upon contrivance of some unit of measurement for the purpose 
3f converting qualitative knowledge into quantitative expression; 
thus properly devised tests make it possible to measure intel- 
lectual aptitude on a scale so that certain quantit}^ figures can 

1 



IMRODUCnOS 


be depended upon to measure relative amounts of intellectual 
aptitude 

Such quantitative description of facts makes it^possible to 
gi\ e m a brief space a great amount of information '“'In ordci 
to accomplish this economj of time and space however, it is 
of the Neatest importance that the units of measurement or of. 
enumeration be uniformly applied and that the nature of these 
units of measurement for observation or of enumeration be con- 
stantly kept in mind when the data are used Turtliermore, 
having alw ays m mmd the nature of the statistical umts chosen 
as cnteria of measurement, it is possible to arrange statistical 
data m such a manner aa greatlj to facilitate their intcrprctatioji 

A large degree of flexibility is thus available when facts arc 
expressed quantitativ elj , and so long as the original units of 
measurement are not obscured, it is possible for specific purposes 
to arrange and reairange a given set of data A part of this 
flcxibiUtv 18 due to the fact that otherwise long, time consum- 
ing methods of analj'sis can be rc&olvcd into relatively simple 
mathematical operations These <lioit cuts and the sav mgs of 
human effort they make possible m the search foi truth aro onl;^ 
possible where knowledge can be exprcsscH^ quantltativclj , 
vvnich IS to sav bv statistics In using these short cut mclhods 7 
however, it is necessary to be ever watchful for hidden incon 
ftistencies with the original units of mcasurcm’enl, fbrltls in this 
lealm that man> of the misuses of statistics are found 

Economj and a high degree of flexibility aie characteristics of 
statistics that well fit them to serve a djnamic socicty's^needs foI 
analysis and formulation of policj It is a lesson learned from sad 
but profitable expcnence that stat istics are something more than 
the mere will to collect facts in quantitative form CarefuTstudy 
bj many scholars has giv cn nsc to rules of procedure that must be 
followed if the economj and flexibility of which statistics are 
capable are to be realized These rules of procedure constitute 
the science of statistics, to several aspects of which attention 
should be directed for differentiation “Statistics" is used 
broadlj to refer to the whole field of the quantitativ e approach to 
knowledge including the gathenng of data, problems of statistical 
measurement, statistical analjsis, statistical theory, and scientific 
method m general The word “statistics” is also used to lefer to 
any one of these parts of the whole subject 


STATISTICS IX THE ARTS AXD SCIENCES 

Accordingly, while “statistics” is used in the broad sem 
indicated, it appli&s also more particularly and more accuratel 
to compiled data that are systematic and quantitative expressioi 
I of facfs or events. 

, The theory of statistics is also called “statistics.” The theor 
of statistics is the bodj’^ of principles that has been developec 
partly a priori b}' the mathematical approach and partly b 
empirical methods, to senm as a guide for sound statistics an 
sound statistical method. Understanding of the theorj^ of sta- 
tistics is required also for compiling statistics. Statistical 
theory is requ ired because nearh'' all compilations of ,quanti^tive 
^acts are samples and not complete enumerations and.because the 
fundamental'"rules*'regardihg units of measurement must be 
obey^Tn-statistic^r'eliumeratibh if the resulting data are to be 
homogeneous, that is, comparable one ui th anoth er., _ 
J'Statistics” also^fefers to statistical method, a term used to 
describe the process of interpreting facts by the use of statistics 
and statistical theor 3 ^ Careful st ud}' of the assembl ed sta- 
tistical data, obtained in a manner~to secure inte mal compara- 
biBi y^hd '^rranged in well-planned table-s, maj’’ be used as a 
ba.sis for j udgm'^s'or "action . Fur{li ei~~quanlit'ative treatment, 
however, may frequently give greater significance to the sta- 
tistics^ Selected summaries may bring out many relatioirships 
that would be difficult to visualize if thej' were in tables of figures 
that had been compiled for general purposes. This additional 
quantitative treatment is of the nature of classification and 
summarization. It is call e d “statistical anah’-sis” and include^ 
the methods of tabulation, graphs, av^gesj’measurS bf varia- 
bilitj'', correlation, index numbers, and similar quantitative 
analj'ses that have been developed. Judgments based on 
._^statistical analysis are called “statistical inferences.” Sta- 
tistical method, then, consists of two parts, (1) statistical anal 3 \‘;is 
and (2) statistical inferences. 

Jin recent j^ears the word “statistics” has also been used to 
describe figures that have been obtained by statistical anatysis; 
for example, arithmetic means, average deviations, measures of 
correlation, and the like, are all called “statistics,” and any one 
of them alone is called a “statistic.” 

,/rhe word “statikics” is thus used to mean all these various 
things together and any one of them separately. This may make 



I\TR0DUC1I0\ 


for confusion, and m the abo\e discussion such usage makes it 
appear as if terms were defined bj use of the term defined, but 
such IS established con^ entional albeit confused, use of this all- 
inclusnc word "statistics ” 

THREE TYPES OF DATA 

Lm'pxncal is Expmmentai Data Answering the accusation 
that their conclusions are so \ague and unpredictable as to pre- 
clude scientific sanction, the social scientists ln^e often pleaded 
that ‘•ocial studies cannot, like the theories of the natural sciences, 
be tested in the laboratory The social sciences must rcl } on ly:, 
on statistics and empirical or historical methods Social theories 
can bo interpreted wath respect to true life onlj if viewed in the 
light of a ceJerta paribus assumption The assumption that 
other things are equal, or unchanged, or m balance scr\e8 the 
social scientist in the same manner as controls over experimental 
conditions ser%e the natural scientists 

With the dc\ clopment, on the one hand, of statistical methods 
in tlic natural sciences and the development, on the other band, of 
cxponmcntal methods m the social pcienceSj th is con trast is 
becoming loss real While it is still true that social science 
predominantU uses empirical or historical data, some important 
work has been done, and more important work appears m the 
ofiing with e\perimental data m the fields of psjchology, 
sociology, education, medicine, piopulation studies agricultural 
economics, and statistical control of qwalitj of manufactured 
products Such outstanding progress in the technical develop- 
ment of this expenmentaJ work has been made as to constitute 
almost a special field called "design of experiments ”* 

Design of Expenmenis The arrangements for making the 
experiment and for recording the data therefrom constitute the 
dcMgn of the experiment In designing an experiment, methods 
of so controlling the experiment as to pre\ ent biased results must 
be dcM'^wl If, for example, the expenment is to test the effects 
upon cotton culture of a certain kmd of fertilizer, sex eral areas m 
xarious localities maj be chosen m order to test the effect of 
the '"ikcttd fertilircr under a number of climatic conditions 
'1 ho doigii for the experiment must then plan also some means of 
measuring tin sc \arious other influences, namelj, tcmpcratuie 

‘Fhior J( \ The Design of rxpenmtnis {IQ3'>) 



STATISTIC^ IN THE ARTE AND SCIENCES 


o 


and rainfall. Some niethodmust also be devised for discovering, 
in the resulting data, not only how much of the productivity of 
cotton is due to the fertilizer, but also how much is due to the 
differing qualities of the soil, to Amrying amounts of rainfall, and to 
varying levels of teniperature. The design for the experiment 
must plan and organize the procedure so that from the resulting 
data it will in truth, bd possible to nmasurethan-et4lu6jieii.c^<a£ the. 
new fertilizer. 

Where cost is a consideration, and it seldom is not, an impor- 
tant part of the design of experiment is to decide to Avhat extent 
to experinaent, in other Avords, hoAV small an experiment AAill give 
trustAvprthj'' results. Before doing this it must be decided how 
much precision in the results, for practical purposes, is required. 

The ’'solutiorTof’^’nie of the problems relating to design of 
experiment maj'- be found by appljdng the theory of statistics. 
The .so.L utiaoJt 0 .flth.ergjs.>a-matter-.of.-common.'Sense,-which.somer' 
t imes is more difficult to apply than might be suppose d. . 

Not only in such a case as testing the use of fertilizer, but in 
mans’- problems, the re searcher finds that a number of factors 
influence a given restdt. In agricultural phenomena, Aveather, 
climate, and other natural and human factors are present; in 
medicine, age, sex, and other conditions affect the application of 
treatment; in biochemical and in psychological experimentation, 
manj’ human and natural variables enter. When it is necessary 
for a given purpose of analysis to isolate one of seA’cral influences, 
the data can he so selected ox the treatments so applied as to hold 
other influences constant. For example, if age and sex as Avell as 
in oculatio n affect the outcome of pneumonia cases, the inocula- 
tioh carT&e tested by comparing inoculated and nordnoculated for 
those of the same sex or age group. It has become the practice 
to call the noninoculated group the "control” in the experiment. ^ 

liy Tqtliet ico-observaiiorMljData.^ In addition to empirical and 
cxpenmSatardE’a^ niake ‘extensive use of a thifd lype, 

hamei}’7 hypothetico‘-9bservatibnardata.2 ._For example, in the 
physical sciences, that the moon is about 240,000 miles from the 
earth is a h 3 ’pothetico-observational datum — no one has carried 

A CJ. Hill, A. BKADFoaOj Principles of Medical Statistics (1939), pp. 4-8 
.md 170-178. 

- Eddingtox, Sir Arthur, The Philosophy of Physical Science (1939), 
pp. 12-14. 



6 


IMRODVCTIOh 


out the expenment of measuring tho distance from the earth to 
the moon, yet on the basis of certam hypotheses it is measured to 
a comparatnely high degree of precwion In the social sciences, 
index numbers purporting to measure such items as the general 
level of prices are hypothetico-obscrvational data In both 
illustrations, upon the basis of certam hypotheses or theones, 
practical methods are de\ised for estimating the measurement in 
question In appraising the resulting estimate, the precision of 
the underlying theory or hypothesis is of primary importance 

SERVING THE ARTS AND SCIENCE S 

Statistics and the Social Sciences and Arts. Polihcs Public 
opinion, the opinion of the masses, can be ascertained at any time 
on a wide vanetj of social and political issues by means of 
statistical data collected by random questionnaires from a com- 
paratively small number of people The employment of sta- 
tistical technique for this purpose has stirred the imagination 
and stimulated the ingenuity of students of the social and 
governmental processes The widespread demand for such 
information and the relatively low cost of obtaimng it by the 
sampling method have also gratified the acquisitiveness and 
Imed the purses of a number of enterprising polling agents 
Increasingly , political strategists appear to pay attention to these 
systematic statistical studies of public opinion Both the major 
political parties m the Umted States have had expert statisticians 
engaged dunng the quadrennial presidential campaigns to keep 
their fingers on the pulse of publi c o pm ion 

It has been claimeH that "’'sampling referenda make the mass 
articulate, define the mandate of our leaders, reveal the true 
popular strength of pressure groups, and show social taboos 
quantitatively for vhat they are worth, ” that they are, in 
the language of journalism, “the fourth dimension for the Pourth 
Estate ’’’ 

Governmental Admimstrahon Statistics are extensively used 
as ^ides to vanous lands of goveromentaJ administration, such 
as samtation, hospitalization, highway supervision, and pubhc 
industnal accident and compensation insurance laws For exam- 
ple, on the assumption that industrial accidents are due to unsafe 

* GAttVP, George Goverameat and Sampling Referendum,’' yournni 
0 / the American Stalislical AasoeuUuntj Vol 33 (1038) pp I31-I42 



STATISTICS IN THE AKTS AND SCIENCES 


7 


conditions and unsafe practices that, if eliminated, would 
prevent repetition of the same or similar types of accident, 
statistical data on causes of accident have been assembled. 
Study of these data enables the statistician to identify and select 
the unsafe elements in transportation conditions and then to 
present the data to safety engineers for guidance in accident 
prevention.^ 

' From its beginning in 1790 to the present day, the Federal 
government has considered statistics on foreign trade so impor- 
tant that an organization has been maintained for the express 

_ — - ' — ■» ..I, 

purpose of asseiiujling such statistics. In the early years of the 
republic the}’' were gathered b}”- the Treasury JDepartment, but 
now they are collected and published regularly by the Depart- 
ment of Commerce. ^ With the rapid development of large-scale 
business organization in the latter half of the nineteenth century, 
public polic}’' with respect to social and economic conditions has 
required the Federal government to maintain a Bureau ^ofXabor 
Statistics which has been engaged principally in the task of col- 
lecting and publishing statistics on prices, cost of living, and 
wages and, in more recent 3 ’^rs, on employment and pay rolls in 
manufacturing industries of the United States. 

It is a matter of common knowledge to all who read newspapers 
that important laws are passed b}’’ cit}’’, state, and Federal gov- 
ernments on the basis of statistical facts assembled regularly 
or collected b}’’ special legislative committees. For example, the 
FederaXReserve Svstcmol banksl ntHeU^ted States was created 
in 1913 aftei'^a thorough study, involving extensive use of star- 
tistics, of the banking situation in this and other countries; 
legislation in the decade of the 1930’s on public works, relief work, 
and social security was largely based on studies of a statistical 
nature. 

B usiness Administration. Statistics are valuable in business 
administration, enabling the manufacturer executive to obtain 
more or less satisfactory answers to such a perplexing question as: 
Making allowance for seasonal changes and expected prices of 
su^rtute"gdods7 wF¥r wiirFeT5risumeFtl~eniand for the^pming, 
yiS^r-^ " S"^e~m anuf^tui^.ymiualr^maI^"@~tim at^ToF'a l^^ 
advance; others’can proce'eH successfully with monthly estimates. 

'ZossoRis, M. D., “A Statistical Approach to Accident Prevention,” 
Journal of Ihe American Statistical Association, Vol. 34 (1939), p. 526, 



8 


j\rroDuciio\ 


Retail rttort c\ccutnrs frcquentlj reqiuro \Moklj orp\cn (hiily 
on «omcfvrticlo^, white rcHcis of pcri‘iKivblc\csptftt)lesor 
other foods miN o\ on ha\o to iimtve hourl> estimate^ 

Wlien the imniifnctunn^i c\rciiti\c lia-s a Pitisfactorj answer 
to the abo\c tjpc of question he can Bchechile production to 
maintain as ncar]\ level a rale as m feasible and to keep as 
constant a labor force as possible In some large business 
enterpnses statistics arc as.si.mWwl dnil} on working capital 
position factorj expense output and consumer crwlit extended 
Control bv the executive is kept flexible and tijneh by a con 
tinuous stream of statistics lioth on tlie internal state of the 
busmovs and on external economic condilion.s Vs oni rather 
erudite busine-Nsman sajs There haa licen an insisu net from the 

itrj top of the organization on getting the farts xotliitwi miglit 

to applj Descartes h picturesque pbra.se Ik clear about our 
actions and walk surc/ootodh m this hfo ’ ‘ 

In lus determination of jiolicics rcgardinR pncca production 
and employment in Jus oum Ijusmoss (J«» enterpriser must 
make judgments ba«ed upon knowledge of the world of pnees in 
which he lac*' IVices lie must pav for raw materials for lalior, 
for equipment and ith upkeep are his piide for detemiming his 
own production activitv and the pnee Jic can eventually obtain 
for hi3 product Since all or at loa.st part of tlio system of prices 
that IS the prices lie pavs and the pnees coa«umcrs pa\ for 
competitive or substitute article- is beyond his control the 
mdiv idual producer ad vptM las pInas to an\ uncontrollable condi 
tions 1 e finds m the market It is bv the use of statistics that the 
modern businessman comes to undervt ind^oiiditions to which, if 
he Ls to profit, he must succcwl in adapting hia own buMnc's 
During recent years polling agencies have been lured by busi 
ness executives to obtain certain tvpcs of information wath 
r&ipect to potential markets and changes m consumer ta.stes or 
habits Student groups and student public itions on the cam- 
puses of colleges and univ ciwties arc cmplov ed bv businessmen to 
make widespread use of polling, teclmiciyics It has also been, 
found tiiat a carefully contlucted stmlent poll can do more 
to make college administrators and trustees cognizant of student 
attitudes toward vital campus issues than the older and Ic-s 
‘ Hatford F L Some Vises of Statist os in Fsecutivo Control 
Journal of the Atnencan Slat Uteal Aimeialiou \o\ 31 (1030) pji 31 37 



STATISTICS IN THE ARTS AND SCIENCES 


9 


effective means of circulating petitions. In the large university 
the student poll performs man 3 ’^ of the functions of the open 
forum in a small university or college. Similarly, merchants and 
classes in advertising can determine the efficacj' of advertising 
bj'' the extent to Avhich students express a preference tor branded 
and highty advertised cigarettes, toilet articles, school supplies, 
and items of clothing to the little or nonadvertised varieties. 
The radio programs to which students listen, the magazines to 
which they subscribe, the amounts they spend for various 
budgetary'’ items, the tj^ie of motion picture they most enjoj’", 
the mileage they travel, and the means of transportation they 
prefer are typical items of information eagerty sought bj’’ adver- 
tising organizations and business firms in college and other kinds 
of markets as well.^ 

j ^In a wide variety of practical ways the statistical principle 
pf sampling is used in business. 


, — o “* — For examplepb^thle’use of. a 

sminripeHTOScQp'6tyan^entire’tmM of pig iron can be tested. 
The speCtrostJUpiStTjpens the car door, fastens a Arire to a sample 
pig, .strikes an electric arc between this and a bar of pure iron he 
carries, and observes the light in the spectroscope. The bands of 
color in the spectroscope reveal to him whether or not the amount 
of impui’ity in the pig is below a previouslj'^ determined standard. 
By properlj’' selecting sample pigs at random the trainload of 
metal can be tested before it is unloaded.” In a similar manner, 
though perhaps vdth less sensational instruments than the 
spectroscope, other types of more or less homogeneoas or stand- 
ardized goods, such as shipments of ores, grains, oranges, potatoes, 
or lettuce, can be tested by sampling. 

Fjfivcntinn . The grading and selection of teachers have in some i 
instances been based upon intelligence tests, Avhich have been / 
perfected by the use of statistical technique correlating test / 
grades Avith empirical results.^ The scientific use of intelligence 


For further illustrations see, for example, tV. B. Dygert, Radio 0^ an 
Advertising Medium; H. E. AgneAV and W. B. Dygert, Advertising Media; 
E. R. tValtcr, Effective Marketing; E. H, Scliell and F. F. Gilmore, Manual 
for Executives and Foremen; and H. B- Maynard and G. J. Stegemerten, 
Operation Analysis. 

= Harrisox, G. B., Atoms in Action (1939), p.' 165. Cf., on sampling for 
grading carl; of iron ore, SteAvart R. Holbrook, Iron Brew (1939), pp. 164r-165. 

= West, Michael, “The Psychology of the Teacher,” Journal of Educa- 
tion, March, 1939, p. 158. 



10 


INTRODUCTlOf^ 


tests has developed since the First World War In 1917 a test 
called the Amenean Army Intelhgence Test v.&s gi\en to the 
drafted soldiere The set of questions included on the Array 
Intelligence Test ^cre based upon the cumulatne experience, 
coraparatuely limited in extent with such tests up to that time 
The var experience with the tests proved to be a landmark m 
their de\elopment m that it constituted a major experiment 
in their u'^e and stimulated rapid de\ elopment m the pnnciples of 
their u'se ‘ Subsequently, the art of constructing questions for 
testing intelhgence now called “ aptitude ”in order to contrast the 
testing of natural ability with the mere testing of acquired ability, 
has greatlj progressed In addition to the college-entrance tests, 
which in part measure opportunity, scholastic aptitude tests are 
used bj the leading universities as a basis for selecting students 
As a consequence statistical data that measure not onlj acquired 
intelligence but al«o native ability, or aptitude, are being accumu- 
lated The aptitude-test rating is often called the “intelligence 
quotient,” or simply I Q 

Mental tests have most frequently been employed with the 
feeble-minded m connection with problems of detection and place 
ment and for determining the tjqie of training best suited to 
individual persons Studies of cnmmals by the use of intelhgence 
tests disclose relationships between intelligence and the type 
of crime committed but apparentlj a high 1 Q neither prevents 
nor stimulates crime m general Dehnquent children have been 
found to exhibit more neurotic traits than do unselected school 
children Tests of emotional control, dishonesty, and lack of 
self-control hav e been found useful in forecasting incorrigibility 
among dehnquent children 

Recentlj a study was made in which the I Q ’s of 21-1 foster 
children, all of whom were adopted before the age of twelve 
months and of 105 control children livung w ith their own parents 
were compared with the I Q ’s of the foster and real parents The 
I Q of the parents was supplemented by information on occupa- 
tional status and other pertineDt data Jjo/fvmstjon regsird^iig 
the true parents of adopted children were secured from placement 
records There was far greater correspondence between the 

* Briqhau, Carl C , Two Studies in Mental Te3t8,” Psyc/iofojicol 
Rniew, Psychological Monographs, Vol 21 (1917) A Study of Aviertcan 
Ir>Uihyen<x(l923) A Study of Error ( 1932 ) 



STATISTICS IN THE ARTS AND SCIENCES 


11 


I.Q.’s of foster children and their true parents than between 
the I.Q.’s of foster children and their foster parents. It was 
estimated by statistical techniques that the contribution of 
heredity to individual differences in I.Q. is probably not far from 
70 to 80 per cent and that the very best emdronment might, how- 
ever, raise I.Q. as much as 20 points, while the poorest en\dron- 
ment might lower it as much as 20 points. ^ 

Sociology. ]\Iodern sociology’’ emploj^s statistical method 
almost to the exclusion of other methods. This maj^ be a mis- 
taken emphasis that vdll be corrected by future sociolo^ts, 
but in that discipline the twentieth-century reaction to nine- 
teenth-century abstraction was particularly great. Moreover, 
this extreme emphasis upon fact by American sociologists^ can 
be traced to the picturesque Lester Frank Ward, who, despite 
the abstract qualities of his writing, emphasized the statistical 
approach. A farmer, a Civil War soldier, a Federal government 
official, a lawj''er, a botanist, a chief of the Dmsion of Xartgation 
and Immigration, and, finally, toward the end of his life, a pro- 
fessor of sociology at Brovm Universit}’-, Ward came to the study 
of sociolog}’' "irtth a richly varied experience. Among the voices 
raised against nineteenth-centur}’- emphasis on nature and the 
neglect of humanity his was the most vigorous. So eager was the 
reading world for this new approach that some of Ward’s books 
wci’e translated into every Continental tongue.® 

Economic Theory. From Adam Smith to the present time 
economic theory has been, at least in part, an inductive science. 
In Adam Smith’s day there were few statistics, but he made 
extensive use of trade, price, and wage data in his analyses. In 
modern times, especially since the turn of the century, more 


' Study by ^L'ss Burks, described in H. E. Gfarrett and M. R. Schneck, 
Psychological Tests, Methods, and Resxdts (1933), pp. 189-190. 

= Lyxd, R. S., and H. SI. Ltxd, Middletown: A Study in Contemporary 
^American Culture (1929); Middletown in Transilion: A Study in Cultural 
Conflicts (1937). These remarkable books are modern classics in sociology 
and arc based almost entirely upon obser^mtional method largelystatistical 
in character. 

“ Chugermax, Samuel, Lester F. Ward, The American Aristotle (1939). 
Beca use of Ward’s optimistic rtews it has been suggested lately that he should 
be widely read both for information and encouragement. Cf. a re'view of 
Chugerman’s book bj’ Prof. Rudolph Binder in The New York Times Book 
Review, Oct. lo, 1939, p. 10. 



STATISTICS IN THE ARTS AND SCIENCES 


13 


s quares^jv^s^rst-disCovefeH and applT^in.'astronom.jc.g^}^ in the 
itineie^nth century. The method continues to be emploj'^ed in 
astronomy to trace the paths of stars, comets, planets, and other 
hear^enly bodies, hlodem astronomy deals mth large numbers 
of observations, which become the statistical raw material for the 
science. For example, the Hansard College Observatoiy receives 
monthly, from nearly one hundred different observers distrib- 
uted the world over, and on report blanks containing seven 
to seven hundred obseiwations each, an average of forty-five 
hundred observations. It has been found best not to attempt to 
analyze each observ'-er’s work separately, but instead to depend on 
multiplicitj'- and frequency of observations well distributed 
throughout, to obtain the best possible light curves. Over fifty 
thousand observations come to the Harvard College Observatory" 
each year, and from 1911 to 1939 it collected tliree-quarters of a 
million observations. ' 

For yea rs the Smithsonia n Institu tion has been using methods 
essen tially statistical in.n atui:eJtqL.r^jgpxdImeasure mmts of tKe 
amount..o£.hea t_rece i ved fro m the sun by the earth. SnuthsoniarT 
stations in three of the mosT arid regions ofTKe^rth are daily 
recording the sun’s radiation. Observers in Chile, in South 
Africa, and in Western United States have been talcing records. 
According to these obsen^ations, Avhich have been made at widety 
separated stations, correlations exist between changes in s olar 
radiation and temperatures on the earth. Study of these records, 
study of. records of the earth’s weather as recorded in the growth 
rings of trees, and .study of similar phenomena have revealed 
recurrent cycles in the weather that may be of great value in 
foretelling long-range trends in the future succe.ssion of fat and 
lean years. ^ 

Zoology. A considerable amount of the experimental work in 
the life sciences involves such qua ntitative considerations as 
weigfits, measurements, enumerations, pomter readings of various 
kinds, comparisons, and classifications. If the results arrived at 
by^ experimentation are to give rise to general principles rather 

^ Cf. Campbell, Leox, "The Light Curve of SS Cygni, 213843,” Annals 
of Harvard College Observalonj, Vol. 90, No. 3, pp. 93-162; Sterne, T. E., and 
Leon Campbell, “Propertie.s of the Light Cim-c of SS Cygni,” ibid., Vol. 90, 
No. G, pp. 189-206. 

2 So says G. R. Harrison, op. cil., pp. 290-291. ' - 



14 


INiaODUClIOh 


than just to meaningless and incoherent single observations, the 
zoological data must be consistently assembled uniformitj of 
units must be obsen ed, and the data classified In other i\ ords 
statistical method must be used to bring order from isolated 
chaotic measurements 

In addition to routine problems of analjsis in zoologj stii 
tistical and mathematical devices have had interesting apphca 
tions in certain special problems Tor example m 1934 7eunei 
used a statistical study of a sjstem of cranial angles as a basis for 
biological inferences regarding rhinoceroses, in 1930 Soergel 
emphasized the importance of statistical methods for certain 
paleontological problems employing numerical and mathe- 
matical procedures to studj footprints and from these drau ing 
inferences regarding the animals that made them, and m 1912 
Ridguay attempted to put the studj of faunal coloration on a 
statistical basis '■ Paleontologists use various mathematical 
and graphic means to restore missing parts m fossil animals and 
to reconstruct hypothetical intermediate stages bctiveen less and 
more specialized animals They also use statistical methods 
to study averages and variation in charactenstics of different 
ago groups rateofgrouth and the like of lanous animals * 
Biology Constdenng the modem emphasis on statistics m the 
social sciences it is interesting to note not only that the method of 
least squares u as first applied m a natural science, but also that a 
second highly important statistical method w as first de% eloped in 
the natural science of biologj This is the statistical measure- 
ment of correlation, which in the 1870 s u as used bj Sir Francis 
Galton to messure the effect that characteristics of midparents — 
that IS, the average of their two parents — had on their children ® 
Biological experimentation m the nineteenth and twentieth 
centuries involving as it does rats, guinea pigs, and the like, 
makes use of procedures that combine the Kboratorj tost with 
the assembling of statistical data and their subsequent analysis 
In this waj the effects and incidence of various diseases and 

‘ SoERGLL Dte Bedeulung variatwatlalaUscker Unlersuckungen fur 
die Sa getier—paMonlologie Bund 63 pp 345M50 Ridqway R, Color 
Slandarde and Color Noinenelatun (1912) Also see Simpsov G G , and 
Vn\l Roe Qmnlilalive Zoology (1939) pp 21 401—106 
’ SiMPsov and Roe op at p 335 
* Sec Chap \III 



STATISTICS IK THE AltTS AND SCIENCES 


15 


of various cures for those diseases are measured; thus also are 
tested the various theories regarding the i-elative importance 
of hereditary factors as compared uith environmental factors in 
animal life. Much of this experimental vork later becomes the 
basis for theories regarding human life and for theories in respect 
to the effects of human diseases and their cure. 

Some problems in biologj’- have interesting applications to 
the homely arts of living. A recent illustration of the use of 
statistics in biology is the standardizing of liquid household 
insecticides, a matter of considerable importance to certain 
private enterprisers engaged in the business. ^ By a series of 
experiments that established the sex ratio of houseflies statis- 
ticall 3 >', hitherto unknown sources of variabilitj’- in the effects 
of insecticides were thrown into bold relief. It was found, 
for example, that flies at ages of less than three days var}^ con- 
siderably in their reaction to the spraj’-, while flies four to sLx 
days old exhibit a fairly constant susceptibility. It was knoum 
that male houseflies are markedl 3 '- more susceptible to certain 
sprays than female houseflies. 

A recent book on heredity- illustrates the extent to which 
biology depends upon statistical technique. Widespread interest 
in the Dionnes led biologists to calculate the probabilit 3 ' of 
quintuplets as compared with the probabilit 3 ’^ of twins. The 
probability of quintuplets is 1/41,600,000, while that of tudns 
is -gV- In addition, the statistical method was used in an interest- 
ing way to answer the question of heredity vs. environment, 
epitomized in the highl 3 ’' talented musical famil 3 '' of .Tohann 
Sebastian Bach, a talent that ran through five generations. 
Were the Bachs musical because of inborn talent or because 
of the musical environment in the home? To answer this 
question the author of the above-mentioned book resorted to 
statistical technique. He obtained information from 36 out- 
standing instrumental musicians, from 36 principals of the 
Metropolitan Opera Compan 3 ’-, and fi’om 50 students of the 
Juilliard Graduate School of Music. From facts obtained by 

1 Campbell, F. L., G. W. Snedecob, and W. A. Simontox, “Biostati-stical 
Problems Involved in the Standardization of Liquid Household Insecti- 
cides,” Journal of ike American Slaiislical Assodalion, Vol. 34 (1939), 

pp. 62-80. 

^ScHEiNFELD, A., Yon and Heredity (1939). 



10 imnonrrcuo^ 

questioning tlic«c poisons the author concludes tint their 
talent IS Krgeb nihtntcil Mmj «iIM%tlcome this trend toward 
basing studies of man uiwn stitustics of human beings rather 
than upon statistics of vcpitabJe? or fruit flies 

3/cdictnc Much of the htatistical work m biologj has 
application in the field of medicine and interest in statistics 
on the part of the medical profession has increased In addition 
the medical profession has become interested m statistics on 
economic and social welfare factors of importance m the control 
of epidemics and of certain l> pcs of di*<a.sc in the modem com 
munitj ' The practical adianfnges to the phjsicwn and to the 
sanitarian of the deselopmcnt of medical statistics are \cry 
great Matters that were ficreeb debated two generations ago 
and concerning w hich onl> few physicians of a hundred j cars ago 
could form an opinion arc now a regular part of the knowledge 
ol a junior medics} fitiidciit Otroiigh the itiidj ol moHaht} 
statistics and \ital statistics* Indeed the medical profession 
in England has reccntlj contributed a tc'ctbook on medical 
statistics designed to acquaint medical students with the fiinda 
mentals of statistical thcor\ * 

Engineering Since the success of their work depends rot 
onlj on the machines but on tiu human Wings who operate them 
mtchamcal engineers ha\e become incrcjv»mgb mtere (ed in the 
use of statistical method for making time studies m marhme 
operations It is now leahzed that such Ptlldlc^ cannot bo 
safeb based upon some a prion scale of the machine s eapacitj 
or upon the record of only one or two oporatnes Rather, 
timcj-study data must bo collected from an entire group of 
operatives so that adjustments can be made according to the 
effects upon operation of the human traits found by statistical 
study to prevail m the machine or m the manner of operation * 

Tw 0 simple eaamples of the application of statistics to electrical 
engineering are the study ol elevator capacity for buildings and 

* C/ Davis MickaelM Wantol RcHparcl in tl e Lconomicand Soeisl 
Aspects ol Med c ne TheMflban1,MenorinlF tdQ nrUrlj October 193o 
pp 339 346 

*C/ Peakl Il\tuoVD Jntrixi ction lo Uerf cal Biomelry and ‘?tah»l cs 
(1923) pp 2 33 

* Hill A C Principles of \Iedteal SlaluUea (1939) 

'Bfrcf'i H B Scientific Management Ml Unionized Plants Mechani 
cal EngtnetTing ^[•l^cil 1933 pp 235-240 



STATISTICS IN THE. ARTS AND SCIENCES 


17 


telephone calls to be handled by an exchange. Statistics 
regarding the number of passengers taken on at the first floor 
are used to determine the time required for passengers to lea^'e 
the elevator, the round-trip time, and the number of passengers 
carried by a given elevator. The most desirable tj^pe of elevator 
equipment to install is determined from such data.^ 

Since engineers are dealing with natural phenomena that 
cannot be affected bj’- human bias, many of their problems can 
be solved approximate^ b}”- the application of the principles 
of probability. For example, during a long period of gaugings of 
a stream the frequency of floods is often the best indication 
of probable future floods. Such important engineering data 
as forecasts of future floods, low annual rainfall, and consequent 
depletion of storage reseiwoir can be estimated bj" apphdng 
the theory of probabifities to statistics on the past history of 
such events. From such data, the use of statistical technique 
makes it possible to estimate the proper size of a hj^droelectric 
power plant and to predict its output and eamings.- 

One. of the most striking illustrations of the use of statistics 
in _^gi neering is the control of the qua lity of manufactur ed.^, 
products.® In ordinary manufacture, rrith the exception of 
the making of optical or other precision instruments of infinite 
refinement, all units of a product are not identical, in spite of 
the vaunted standardization of products in industry. The cost 
of so refining the machines or of so regulating their operation as 
to make all units of product identical would be prohibitive 
and in most cases unwarranted because of the low market value 
of the product.' Variations in quality are thus considered to be 
Justified, and it is the purpose of quality control to develop 

^ Cook, H. B., “Selecting Elevators for an Office Building,” Power, !Mar. 8, 
1932, pp. 404-408. 

^ Ckeageb, tv. P., and J. D. Jusxix, Hydro-elecln'c Handbook (1927), 
pp. 43 and 171. For other illustrations of the use of statistics in engineering 
science and art, see G. W. Hubbard, “Investigation of Errors of Pitot 
Tubes,” Transactions of the American Society of Mechanical Engineers, 
August, 1939, pp. 477-506; H. K. Barrows, Water Power Engineering (1927), 
pp. 54-57. 

= SHEWHAKT,'tV. A., Economic Control of Quality of.Manufadured Product 
(1931). ■ Since Shewhart’s pioneer efforts on this important subject, much 
progress has been made, so much that one might sa}' a new craft has been 
created. 



18* 


INTHODUCJIOS 


btatistical means of Bho\Mng the actual statistics of variations 
m quahty, the economically penmssiWe \anations m quahtj, 
and the statistical measurement of i\ajs of locating and cor- 
recting causes of quality vanation beyond set limits Such 
control IS designed to reduce the number of products that 
must be discarded as below standard, consequently, if successful, 
quality control reduces waste and lowers manufacturing costs 
per umt of output In addition, selling costs are reduced and 
good mil improied, because quality control decreases tbo 
number of customers who become dissatisfied as the result of the 
inconvenient necessity of returning inferior products 

Althougli used in the American Telephone and Telegraph 
Companj under the leadership of Dr Shewhart, application 
of statistical quahtj control has been negligible in the United 
States* In Great Britain, howexer, the idea of statistical 
quahtj control was accorded an enthusiastic TDceptJon folloxving 
Shewhart’s MSit to London id 1932 A comnuttce headed bj 
Dr E S Pearson was oi^msed by interested Bntisb indus- 
tnalists to consolidate prenous progress and facilitate adoption 
of the technique * By 1937 m England the methods had been 
applied to coal, coke, cotton yams, cotton textiles, woolen 
textiles, spectacles glass, lamps, building materials, and manu- 
factured chemicals * 

Physics and Chemistry There is no dispute among modern 
physicists and chSinrsts as to the importance of statistical 
methods m their sciences E\en the highlj metaphysical Sir 
Arthur Stanlej Eddington in his Nature of the Physical World 
(1D28) attaches great importance to statistics in the natuial 

‘ Two reasons have been given for Ibis failure of statistical quahtj control 
to be applied in the United States [first,] a deep-seated conviction of 
Vmerican production engineers that their principal function is so to improve 
technical methods that no important quality i ariations remain, and that m 
anj Case the laws of chance have no proper place among modern ‘scientific ’ 
production methods, second, the difficidty of obtaining industrial 
wAo are aa’equrafeii framed iit ffiis lairfy compficatecf fieW 
Fru-m^n, H a, "Statistical Methods for Quality Control ’ Mt<^anical 
Engineering, Vol 59 (1937), pp 261 262 

*Peik30v, E S, The AppfioafiOA of Statistical Xfelkods to fndustnal 
Standardization and Quality Control (British Standards Institution, London, 
1935) 

‘pREEMcN Op eil.pp 261-262 



STATISTICS m Tim Aims and sciences 


19 


sciences. In fact, he says that the laws of nature divide them- 
selves into three classes, (1) identical laws, such as the law of 
c onservation an d the law of graAutation; (2) staSsti^ laws, 
such ^ Boyle ^s ' lawT^e" ire’cOTiTTaYr'°oF thei’modynanucs, and 
qua]ntuiirlawsjjand~(3^ laws, wEcE'afe~“g®iune 

laws of control in the ph 3 "sical world. 

In phj^sics, statistical technique is emploj'-ed in the study of 
molecules. This modern statistical approach has a philosophical 
background that goes back at least as far as Boltzmann, who 
in 1866 expressed the second law of thermodjmamics in terms of 
probabilities. His contribution was regarded as a form of 
mj'-sticism until it was demonstrated by research during the 
first two decades of the twentieth century.- At the turn of the 
twentieth' century Max Planck was trying to explain why pieces 
of matter heated to high temperatures emit more light of one 
wave length than of any other and less light at both larger and 
shorter wave lengths. He could not explain this phenomenon 
except by supposing that light is emitted b}'^ atoms not as con- 
tinuous trains of electromagnetic waves but in discrete bundles 
of energj’- that he called “quanta.” Similar experimental 
work accompanied by new theoretical contributions, notablj’' 
those of Heisenberg, led to -the formulation of the modern 
statistical approach to the natural sciences. Within three 
decades this new theorj*- has come into widespread practical 
use also, having found application in explanation of the behavior 
of photogi’aphic plates, the conduction of electricity through 
wires, the conduction of heat through walls, the behavior of 
photoelectric cells, the manner of emission and absorption of 
light by atoms and molecules, and in the theor}’’ of metals.® 

As explained by a recent xJopularizer of the natural sciences, •* 
Newtonian mechanics succeeds in accurate^ predicting motion 

* Stbbbixg, L. S., Philosophy and the Physicists (1937), p. 70. 

= Haas, Aethtjr, The New Physics (1923), pp. 38-44. 

“ Cf. Eldkidge, J. a.. The Physical Basis of Things (1934), pp. 357-358. 

De Broglie, Louis, Matter and Light. The New Physics, translated by 
W H. Johnson (1939). For a popularized description of the experimental 
development based upon Boltzmann and later Heisenberg’s theories, see 
also Eddington, op. cit. Cf. William AI. Alalisoff, review of De Broglie’s 
Matter and Light, in The New York Times Book Review, Oct. 1, 1939. Also 
see H. Lifschutz and 0. S. Duffendack, “The Counting Losses in Geiger- 
Aliiller Counter Circuits and Recorders,” Physical Review, Vol. 54 (Nov. 1, 



20 


/^ IPODl CTIO\ 


orcjimng on the hwrn'in ‘ind al-o on the «cale of celestial 

bodies In other word' toman mechanics docs well for 

macroscopic measoncmcnts But in the ina estigations of the 
motion of the niicro«copir particles inside the atom, Newtonian 
mechanics ceases to ha\e aalue while quantum mechames makes 
it possible to grasp the meaning of new principles that must 
necessanlj be introduced m these more minute analyses The 
principles referred to are statistical m nature and are based 
in large part on the theorj of probabihtj ‘ It is impossi 
blc to measure seaeral physical quantities (as energ\, portion, 
momentum) accuratelj at the same time It is this neces 
sarj inexactness that has forced us to find our ultimate laws m 
probabilities 

It must not be supposed that the new statistical approach, 
which 13 said to hax e been deni ed from Heisenberg's uncertamtj 
principle, necessarily has thrown into chaos concepts of physical 
measurement The admission that laws m quantum mechames 
arc statLStical ma> destroy the idea that the unnerse is a huge 
machine but in a gixen case, with the initial conditions detei- 
mined as precisely as the pnncipic of uncertainty permits, the 
probability of all subsequent stat&s is determined by exact 
mathematical probabilities There is nothing law less in quantum 
phenomena ’ Analy sis shows, morco\ er, that the theoretical 
uncertamtN which prohibits a aimultaneous accurate measure- 
ment of position and of x elocity is noticeable only m dealing with 
the %erx minute masses of the subatomic world With ordinary 
masses the theoretical uncertainty, though still existing falls 
below the practical uncertainties, whicltaro due to the imperfec 
tion of human obseixations, and i$ comjjletely submerged by the 
latter This gradual obliteration of the quantum uncertamties, 
as the «cientific obsener passes to the commonplace Icy el of 
aierage maws is the reason why New-tonian mechames is still 
used For the small yelocities and relatively lai^e masses inth 


1938) pp 71-I 725 A RuarL, Tlie Time Distribution of So-called Random 
Events, Phpnca! Beview yo] 56 (Dec 1 1939), pp 1160-1167 h R 
Rutherford Radtationg from RadioacUve Substances (1930) Chan VII 
pp 171-172 

> Eldridge op cil , p 376 Cf Tonyivv, R C , The I nnctples of S«-v 
Usticat ^feehnntes (1938), p 6o 
’Stebbivc op «/,p 183 



STATISTICS IX THE ARTS AXE SCIENCES 


21 


■\\hich the scientist is usually concerned, the tu'o meclianics yield 
lesults so nearly alike that in practice no experiment -would be 
sufficientlj’- refined to detect the dilTerencc. Since the Newtonian 
mechanics is mathematically the simpler, there is e'\'^eiy ad'\'^antage 
in retaining itd 

Except for Einstein's theory of relatiA-itj' there has been nothing 
so to stir the imagination of the natural scientists in the twentieth 
century as this new statistical approach. In fact, one Avi-iter has 
said that the entire structure of modem ph3^sics and chemistry', 
and therefore of all the natural sciences to which thej’- are funda- 
mental, rests upon quantum mechanics. - 

From the above discussion it is readily apparent that statistical 
techniques are helpful, not onl3’' to theories in the natural and 
social sciences, but to the arts dependent on those sciences. Yet 
for man3’- students the most important reason for -knouing some- 
thing about the fundamentals of statistical method is the need for 
intelligent discrimination between the proper and improper use of 
statistics. Unfortunatel3q a large portion of the extensive 
modern employment of statistics in all fields falls under the latter 
heading. This is especiall5’' true in popular presentations of 
modem scientific and political matters. Too close attention to 
the mechanics of a method and the neglect of common sense are 
responsible for a large number of these horrible examples. All too 
often, preoccupation tvith the technique dims common sense. 

Statistics and Philosophy. Nineteenth-centuiy cocksureness 
of the scientific approach, pretending to such a degree of precision 
•and to such broad scope as to annihilate the foundations, for 
ethical, moral, and religious faiths, has largely disappeared. 
Under the aegis of the assertive and materialistic science of the 
nineteenth century, behef in free -will was dwindling to a mere 
superstition; but the element of indeterminacy brought into 
science as a result of the application of the theory of probabilities 
again permits freedom. This decline of mechanistic assurance in 
science has not been ignored by philosophic thought, which has 
emphasized as never before a lesson that has often recurred in the 
history of philosophy: objective reality is not alwa3^s identical 
with subjective concepts. ^ Eddington expresses these doubts 

I D’Abro, a., The Decline of Mechanism (1939), pp. 37-57. 

= Harbisox, op. cit., pp. 341-342. 

“ Eldridge, op. cil., pp. 379-380. 


JNTJtOnUCTIO'i 


\ ^ 

22 

m the following words ‘ “Does the spectroscope find the cdlors, 
or docs it male them’ When the late Lord Rutherford showed 
us the atomic nucleus, did he find it or did he make it? How 
much do we di«co%er and how much do we manufacture by our 
experiments’” 

Just as surel> as the railroad destrojed the aupremacj of the 
stagecoach or the radio eclipsed the populant> of the phonograph, 
so ha\e the discoveries of modem science eclipsed faith m many 
ideals and beliefs that served to give reason to the lues of the 
masses of the people In the realm of ethical and moral values, 
buttressed by the dogma of a bygone age, nineteenth-century 
scientific method w aa almost w holly dcstructu e and hardly at all 
constructive Modem philosophy criticized scientific method, 
both the laboratory and statistical branches, for failing to provude 
new moral v alues to replace outmoded prescientific ones Despite 
this gloom} aspect, philosophy’s greatest spokesmen look to 
scientific method it«c!f to obtain the necessary enlai^ement 
of the conception of human nature and the formulation of the 
required new moral values John Dewey envasages the use of 
f-cicntific method to create a comprehensive democratic culture as 
a guarantor of genuine freedom * 

SUMMARY 

Statistical method — the quantitative expression of knowledge, 
the marshaling of facts and their arrangement in a form suitable 
for scrutiny — has been the means employed by businessmen, 
natpral scientLsfs, and social scientists to establish bases for 
judgments regarding factual data so complex or so numerous as 
to be, m the unmarshaled state, intellectually incomprehensible 
Commercial statistics and their mtcrpretation may, indeed, be 
said to constitute the scientific background of business todaj 
Men cannot conduct their business intelligently without them 
Quite as important as statistics of commerce and trade are the 
more recently developed industrial and social statistics data on 
employment and pa} rolls in mdustiy, trade, and finance and on 
the distribution of income 

In the science of government and its practical art the sta 
tistical approach has prov ed itself essential as facts hav e accumu- 

* Op cii pp loa-ioo 

* Freedom and Culture (1939), jnmm 



STATISTICS IX THE ARTS AXD SCIENCES 


23 


latSd and, to an increasing extent, as means have been developed 
for making the quantitative units of measurement required. 
The importance of statistical techniques in the natural sciences is 
attested to the definition of “science” so familiar to evei^’^ 
schoolboy: Science is systematized knowledge. Statistical 
mechanics is essential to an understanding of modem physics 
and chemistry. Whatever the indmduars station or calling, he 
is to a greater or lesser extent using statistical techniques. 



CHAPTER 11 
GATHERING STATISTICS 

Before the commercial molution of the sixteenth century, 
social and economic life n “Xt) rel Un el} simple The small villages 
and ton ns w ere self-sufficing economic and social units Little or 
no statistical enumeration of facts was required to comprehend 
the extent of the population, the number of buildings, the number 
of cattle and the quanlilj of other constituent units m the com- 
munity Within the limited range of space and time usually 
contemplated events haxing to do with the welfare or distress of 
the community w ere not complex Judged by modern standards, 
government was simple and inexpensive because social and 
economic relationships xxere not complicated Even the great 
cities of the time were not large compared witli modern metro- 
politan distncts In population wealth, and trade, the extent 
ofasixteenth centurynation was inconsiderable and furthermore, 
was growing almost imperceptibly In other words, conditions 
were relativelj simple and static 

Genesis of Fact Marshalmg Under sucli conditions, little 
was done m the way of the systematic gathering and analysis 
of statistical data The situation did not demand the con- 
tinuous assembling of up to-the-ininute facts Indeed, it was 
not profitable to do so Tlie motixe did not exist m sufficient 
force to direct attention to the problem of expressing quanti 
tatively the events of contemporary social and economic life, 
and the facts of the natural sciences were obscured in medieval 
mysticism or cherished from a foi^tten age by a few scattered 
and scholarly churchmen Nevertheless it was found useful 
on occasions to make iircat «airyeys fhiJ. rouJd jiiihseijj.ie.uJjy 
serve as the basts of governmental decision in regard to taxation 
and othei ‘social activities, and that might aKo be a guide to 
pnvatt enteipnse Pepin the Short in 758 and Charlemagne 
in 762 demanded detailed descriptions of church lands, while 
several works written in Prance during the first half of the ninth 
24 



GATIIERINa STATISTICS 


25 


contiuy gave a partial enumeration of the serfs attached to the 
land.i Likewise, Avhen William the Conqueror rmdertook the 
reorganization of the national government of England in the 
eleventh century, he found it desirable to make his famous 
sunmy, which resulted in Domesday Book, completed about 
1086.2 g^g eai-iy gg fourteenth century, the medieval 

guilds gathered statistics in connection with their regulation of 
markets. 2 Later, in the fifteenth centmy, as the breakup of 
the medieval S 3 '^stem gathered momentum and as the rise of 
trading groups accelerated, there was a great increase in the 
amount of statistical work done guilds as well as by central 
governments — the latter not infrequently through guild organiza- 
tions or through the Church. Economic statistics n^ere collected 
when the occasion demanded, for example, when the upsetting 
of a customaiy price by a flood or drought required explanation 
and the determination of a new customary price. Although 
there is evidence that in these several waj'^s statistics v'ere 
assembled, they were neither methodically made nor preserved. 
There are isolated instances of the registration of deaths or 
baptisms in the fourteenth and fifteenth centuries, but it was 
not until the sixteenth century that an}^ considerable movement 
toward statistical enumeration of facts occurred. 

Development of a Dynamic Social Order. During the Renais- 
sance, from thirteenth-century Italy to fifteenth-century Spain 
and England, the quantity of data in the physical sciences 
steadily accumulated from experimental efforts of astronomers 
and other scientists. The most dramatic of all human experi- 
ments was made by the voyagers seeking to prove that the world 
is round. The discovery of America and the voyages of explora- 
tion of the sixteenth centuiy gave great impetus to the develop- 
ment of trade and the growth of nations.^ Motivated by the 
economic ideals of mercantilism, a period of trade development 
followed, the domestic system of manufacture rapidly expanded, 

1 Walker, Helen M., Studies in the History of Statistical Method (1929), 
pp. 32-33. The History of Statistics (compiled and edited by John JCorcn, 
1918). 

2 Cheyney, Edward P., An Introduction to the Industrial and Social 
History of England (1925), pp. 17-18. 

^Faure, Fernand, “The Development and Progress of Statistics in 
France,” in. Koren, History of Statistics, pp. 229-233. 

^ Faulkner, H. U., American Economic History (1931), pp. 34-57. 



20 


l\TnODhC7lO\ 


and the colonial empires of Portugal, Spam, France, Ilollaad, 
and then England emerged The change n-as from haphazard 
and occasional trade of the merchant ad\ enturers of the sixteenth 
centur3 to the more or less •5\stematic and regular international 
and intcrcolonnl trade of the ‘«\entecnth and eighteenth cen 
tunes Along wth this trade de\elopment came the necossitv 
for obtaining more regular information concerning markets, 
population wealth pnccs, and the moacments of merchandise 
and gold Furthermore, with this growth of trade both in 
volume and in complexity, governmental and social organization 
became more complex 'X. *■ 

As the fact of change was revealed bj the events of the com 
mercial rciolution national goiemments began to feci the 
need of more regular fact finding m order to visualize and to 
interpret changing conchtions ^ct it must not be supposed 
that well organized or am considerable amount of statistical 
data for the sixteenth or even the seventeenth centurj can now 
bo found It was more a case of an awakening of the will to do 
rather than a case of actual accomplishment For it was really 
the Industrial llcvolution and the vigorous growth that took 
place m the eightccntli and nineteenth centuries that gave the 
actual impetus to svstcmatic marshaling of quantitative facts 
It was not until the earlj part of the nineteenth century, indeed, 
that most of the essential principles of statistical method, even 
for ptirclj dcacnptive purposes, had been evolved iVlso, the 
compilation and current use of statistics as practiced todaj 
hav e been made possible onlj bj the grow th m transportation 
and communication facilities, a nmetccnth-century phenomenon 
It was al«o from the eighteenth century onward that the achieve 
ments of scientists in accumulating experimental data for the 
natural ocienccs fired the imagination of scholars to solve the 
problem of data accumulation for the sciences of life and social 
behavior 

Qmnttlafiic expression \Mierc there are largo 

populations, great nations of tens of millions of people, all 
problems of social, economic, and political organization are 
increaMxi man} times in complexity and, furthermore, new 
jirobltms arise The problem of feeding such large populations, 
the problem of housing them, th© problem of keeping them 
empIo}e<l and preventing them from harming each other, to 


GATHERING STATISTICS 


27 


mention but a few of the considerations confronting the govern- 
mental administrator — all these are vastly complex owing to the 
great expanse of geographical space covered and the varjdng 
conditions at different places and times. In simple economies 
many of these problems can be solved by permitting individual 
freedom of choice and free economic enterprise; but as the 
community becomes more and more closely knit in economic 
and social relations and as various forms of economic power 
emerge, indmdual freedom of choice and free economic enter- 
prise become goals that must be consciously sought by organiza- 
tion rather than natural tendencies that develop unaided. * 

In the intricate social and economic organizations of the 
modern' era, it is inconceivable that any individual or group of 
individuals can obtain the knowledge necessary to form judg- 
ments^ concerning the issues that arise. An indi\'idual can 
comprehend only those conditions within a reasonable geo- 
graphical area about him; the more complicated society is, the 
smaller the area about him that he can understand without the 
use of statistics. “"We are overw’helmed, not onlj’- by the diver- 
sit}’’ of knowiedge, but also by the diversity of possible deeds, of 
possible values, and of possible judgments,” and, further, “this 
human mind, wiiose needs Plato so perfectly understood, still 
insists upon constructing for itself a fixed Avorld in the midst 
of a fluid one. It persists in thinking in terms of aims and 
ends and perfections; of ideals, of pui-poses, and final goods; 
and, at the very last, it insists upon assuming some direc- 
tion in change, something toward w'hich the chain of events is 
moving.”^ 

In this effort it is impossible for the individual to surve5’- the 
conditions qualitatively — it w'ould-take him many human life- 
times to inspect the w^hole population, and the capacity of the 
human brain is not adequate to the task of absorbing so complex 
an impression. If he attempts a microscopic suiwej’-, he is 
quickly smothered by overw'helming detail." If he attempts a 
macroscopic suiwey without the use "of statistics, he is compelled 

1 For the complexity of modem society, as it is reflected in statistics, see 
publications of the United States Bureau of the Census. For suggestive 
special studies, see Corrington Gill, Wasted Manpoioer (1939); Henry Pratt 
Fairchild, People (1939). 

2 KnUTCH, J. AY., Arl and Experience (1932), pp. 121, 211. 



2S JMKODUCllOy 

to report to guesswork and eommonh originates "cloud push- 
ing fantasies Furthermore, the individual’s per«onalit} tends 
to bia-s him not onlj m his observations but also m his judgments 
If he IS tcmperaraentalh inclined to be impressed bj sordid 
Hungs he is hi th to notice them more than the good things in 
his surroundings and bis judgment is correspondingly pessi- 
mistic On the other hand if he is temperamentally optimistic, 
he tends to consider as the rule the good things and to regard 
ns unusual the sordid things of life Where it is necessary to 
gun knowledge concerning Ki^e iiopulations of people and 
things where social and economic life is comple\, there is need 
to u-sc statistics * 

Rational Basis for Gathermg Data Quixoticallj , accumulating 
tlatn IS not to be confusctl mth scientific fact gathering riio 
progressne accumulation of useful quantitatuc facts has been 
htimulated and furthered bj a definite conscious puiposc 
lo look at the process lustoncallj it was the rise of nationalism 
and tlic mcrcantilistic ideal that supplied definite purpose for 
the fact finding inquiries of the eighteenth centurj political 
anthineiicuns Modem survuals of the same nationalistic 
and mercantilistic ideals im|)el go^cmmcnts to spend vast 
sums for the collection of statistics designed to measure the 
wealtli and material position of the nation and to furnish business 
cntei prise \nth facts about markets. Underljing much of this 
cfTort abides also a sincere interest, stimulated bj scientific 
research, m real human welfare As a consequence, the modern 
census attempts to collect qunntitatnc facts directly or indirectly 
concerning the health and morals of the nation’s people The 
subsequent usefulness of such statistical data depends upon 
how well the simple niles of common sense hn\c been followed 
m assembling and m jircsenting the data 

I nits of Dcscnption atid Measuremenl The units of descrip- 
tion or measurement b> means of which quantitative facts 
ire to be assembled must be carefuUj defined When defined, 
such units must be stiicth applied during the assembling of the 
ifita and in all subsequent analjsia It is accoidmglj of the 
utmO''t importance clearly and full} to describe units of descrip- 
tion uid nu%suienicnt m Ul sulisequtnt use of the data Sucli 
inks uc VO pk 11 and so easiU rc'»ol\od into mitters of simple 
romnioii seiist tint it H'cms dmost a waste of time to direct 



GA TUERIXG ST A 7'IS'J'ICS 


29 


attention to tliom; yet to folUnv them is not ahvays so simple a 
matter as might be supposed. 

For example, in 19-10 thousands of enumerators undertook 
the task of counting the poinilation of the United State.s, of 
counting the number of farms, farm animals, and all other types 
of wealth, and of obtaining specified information concerning 
eveiy person living in the United States. One may ask: \tTiy 
should mere counting be a complicated task? This question 
would be quickly answered by the farmer’s boy who has just 
finished trying to count the number of chickens in his pens. 
Everything would be easy if they would onl}' stand still. People, 
as well as chickens, do not stand .stdl while they are being 
counted, and simple matters mount up into a veritable host of 
intricate difficulties. Suppose you were an enumerator and 
in the first hou.se approached you found that the mother is in 
the maternit}^ hospital, a baby was born at 10 A.x. of the census 
daj', one son i.s awa}' at college in another state, a daughter is 
boarding and rooming in a neighboring toum, where she teaches 
school, and the father is in jail for evading incorhc taxes. On 
several points jmu would feel the need for very specific in-stnic- 
tions. 'J’o avoid double counting or the failure to count many 
individuals, instructions to the enumerators must bo given wath 
great care; every possible complication must be foreseen. 

In i-ecording fads about manufacturing and trading, or 
merchandi.sing, enterprises in separate categories, when is an 
enterprise a manufacturing concern and when is it a trading, 
or mei'chandising, concern? In recording statistics about farms, 
when is a farm not a farm but a truck garden? These few 
examples are probably .«ufficient to emphasize the point that the 
unit must be carcfull}" defined and that the defined unit must be 
strictly followed and freely or even religiously disclosed to all 
Avho in the future use the statistics. 

Carefully planned schedules of qviestions, often called ‘Uiues- 
tionnaires,” are the principal means of gathering statistics. 
These vary from schedules simple enough for oral presentation, 
fis frequently utilized in polls, to the elaborate forms used by tlie 
government or research organizations. In the first phase of 
statistical investigation, the gathering of facts, care in following 
all the rules of common sense and logical definition is epitomized 
in the formulation of the questionnaire, or schedule. 



30 


INIRODUCIION 


QUESTIONNAIRES OR SCHEDULES 

Official Example oj Care tn Descnptton of Units In taking the 
United States census, for exanipJe, the assurance of accuracy in 
regard to these important but detailed matters is guarded by a 
skillfully organized sjstem Forms are supplied, inth column 
arrangement for wnting in all the required information A 
question appears at the head of each column, and the columns, 
and therefore the questions, are grouped into subjects, thus in the 
schedule for the 1940 population census there are 34 columns 
grouped under the subjects location, household data, name, rela- 
tion, personal desenption, education, place of birth, citizenship, 
residence, and employment status In addition, columns 35 to 50 
contam supplementary questions to be asked only a sample of all 
the persons enumerated Figures I to 3 show in three sections 
the 34 questions asked of all persons Figure 4 is included to 
show the questions on employment status m the 1930 census, 
revealing thereby the great elaboration of this typo of question in 
the 1940 census 

Sample forms that had been filled m with illustratue answers 
were supplied to enumerators, and a complete, simple desenption 
of the manner in w hich the form was to be filled out w as pnnted on 
the sample schedule Pamphlets were issued for the use of 
enumerators, gning detailed instructions For the 1940 census 
there were issued to enumerators taking the census of population 
and agriculture a printed and indexed pamphlet 173 pages long 
This ga\ e detailed defimtions and described procedure for enu- 
merators to follow under the various circumstances that might 
arise in their house-to-house cam asses 

Moreover, the enumerators worked directly under experienced 
district supervisors, who, m turn, were under area manageis 
responsible to the Bureau of the Census in Washington To tram 
the 529 district supervisors m the 1940 census taking and census 
procedures, a picked group of more than one hundred men from all 
parts of the country were given a special couise of instructions m 
Washington Those who passed the examination were sent out as 
area managers to the 104 census areas, each to direct the training 
of five to fee\ en district supervisors and to act as regional manager 
between them and the Bureau of Census in Washington 

The 529 districts were broken into enumeration districts of 
which there were about 147,000 Generally speaking, there was 



Torm p-a ILLUSTRATIVE EXAMPLE of Completed Po 


GATHERING STATISTICS 



u u C 

o ^ a 

*0 ^ o 

o* o w 

•O t-*r; 

O O o 

<3 O, C 

O J3 *2 

a «.2 

s il 


(a> "(pjBJ p9*»p»Ai 

•{K> p«2i»if(s> ajJats 
— sa}n< 


mi n *17 


•»«j JO 10103 I 2 


C£) a{Tm>j ‘(K) »,tx-X9S I O I 


gg|yg t|l «»>!> . inoa 


• “* js-a 
a 4 • 

~«JS«aa». 



(OS lotai) 

laifj « QO oiti p|oi|a«ao^ *171 taoQ 


patoal|i 

i9*paQ«o/i ’aiDoqja ooio^ 


( 8 ) pjlQoi 10 fO) P^oao auoQ 




J9 JiplO B 

f p(07a«no7 |o laqaiiQ 


2 (fueipsvtaii'ioilJxiQnaofBeH 


• 3 j» 'pen 'JSa 04 « 



Kto. 1. — Tho first 12 questions on the 1940 census sohodulo grouped under topics locution, hnusoliold dots, 
nnnio, rolntion, imd porsonnl description. Notice tho snmplo entries. 











32 


INIHODVCIION 



th* 1940 census schedule Brouped under topics education place of birth, 



GA TIIERIKG HTATIHriCE 


33 



Fio. 3. — Quostioiis 21 to 31 on flio 1910 consns so.hoilulo Ri’ouped niulor topic employ inont status of person fourteen years old and 
over, rvith subRioupiiiR iudleateil in captions of columns. Notice the ssunplo entries. 

















Shctt No 


I\TIt0DVC1IO\ 



M'hplnle tbm qucstiona worn placed amonn (he »op{ fempti 









GATHERING STATISTICS 


35 


one enumerator in each of these districts, but in certain regions 
one enumerator covered more than one district. • Therefore, 
about 123,000 enumerators were used. Wide publicity for such 
careful preparation in the case of the 1940 census resulted from 
Congressional protests about some of the new questions.^ 

To illustrate the necessity for careful definition of units and 
description of procedure and to solve the census problem of the 
amazing family described above, the foUomng is quoted from the 
Instructions to Enumerators:'^- 

Who Is to Be Enumerated in Your District 

300. The problem of who is to be enumerated in your district is 
extremely important. Therefore, study very carefully the following 
rules and instructions. 

301. The Census Day. There should be a return on the population 
schedule for each person alive at the beginning of the census day, i.e., 
12:01 A.M. on Apr. 1, 1940. Thus, persons ivho died after 12:01 a.m. 
should be enumerated; and infants born after 12:01 a.m. on Apr. 1, 1940, 
should not be enumerated 

302. Usual Place of Residence. Enumerate every person at his 
“usual place of residence.” This means, usually, the place that he 
would name in reply to the question “IVliere do you live?” or the place 
that he regards as his home. As a rule, it will be the place where the 
person usually sleeps. 


Persons to Be Enumerated in Your District 

304. Enumerate all men, women, and children (including infants) 
whose usual place of residence is in your district or who, if temporarily in 
your district, have no usual place of residence elsewhere. Persons who 
move into your district after Apr. 1, 1940, for permanent residence 
should be enumerated by you, unless you find that they have already 
been enumerated in the district from which they came. 

305. Residents Absent at Time of Enumeration. Some persons having 
their usual place of residence in your district may be temporarily 
absent from the household at the time of the enumeration. These you 
must enumerate with the other members of the household, obtaining 
the information regarding them from their families, relatives, acquaint- 
ances, or other persons able to give it. However, do not include with 

1 The New York Times, Feb. 27—29, Mar. 1-3, 1940. _ 

= Bureau of the Census, Instructions to Enumerators, Population and 
Agriculture, pp. 14-18, 80-81, 



36 


IMHODUCIIOV 


the household a son or daughter permanently located elsewhere or 
regularly employed elsewhere and not sleeping at home 

306 Persons to be counted as members of the household include the 
following 

a Members of the household temporanly absent at the time of the 
enumerition, either in foreign countnes or elsewhere in the United 
States, on business or \isiting 

b Members of the household attending schools or colleges located m 
other districts except student nurses away from home and students m 
the Naval Academy at Annapolis oi in the Militarj Academy at est 
Point or in any other training school or institution operated by the 
War or the Nav’j Departments or the United States Coast Guard 
c Members of the household who are in a hospital or a sanitarium 
but who are expected to return m a short penod of time 
d Serv ants or other employees who Uve mth the household or sleep 
m the same dwelling 

c Boarders or lodgers who sleep in the hou«c 
/ Members of the household ciuulled in the Civilian Conservation 
Corps 

307 In the great majoritj of cases the names of absent members will 
not be given to you by the persons furnishing the information unless 
particular attention is called to them Before finishing the enutnera 
tion of a household therefore >ou should ask the question Are there 
any members of the hou«ehold who are absent^ 


Persons \ ot to Be Enumerated in 1 our Distr7Ct 

313 There will be a certain number of persons present and perhaps 
lodging and sleeping m your district at the time of the enumeration w ho 
do not have their usual place of residence there As a rule, do not 
enumerate as residents of your distnefc any of the following classes 
except as provided m paragraph 314 
a Persons temporarily vusitmg with the household If, however, 
they do not have any usual place of residence from which they wall be 
reported they should be enumerated with the household 
b Households temporarily in jour district which have a usual place 
of residence elsewhere from which they will be reported 
c Transient boarders or lodgers who have some other usual or pei ma- 
nent place of residence, that is, who have a house or apartment else- 
where in which they usually reside and where they will be enumerated 
d Persons from abroad tempoiarily visiting or trav eling in the United 
States and foreign jiersoiis employed in the diplomatic or consular serv- 
ice of jour countrj (see paragraph 331) (Enumerate other persons 



GATHERING STATISTICS 


■37 


from abroad who are students in this country or who are employed here, 
however, even though they do not expect to remain here permanently.) 

e. Students or children living or boarding with tins household in 
order to attend some school, college, or other educational institution in 
the locality but who have a usual place of residence elsewhere from which 
they will be reported. 

/. Persons who take their meals with the household but usually lodge 
or sleep elsewhere. 

g. Servants or other persons employed by the household but not 
shewing in the same dwelling. 

h. Persons who were formerly members of this household but have 
since become inmates of a jail; or a mental institution, home for the 
aged, infirm, or needy, reformatory, prison, or any other institution in 
which the inmates may remain for long periods of time. 

i. Transient patients of hospitals or sanitariums. Such patients are 
to be enumerated as residents in the households of which they are mem- 
bers and not as residents in the institution, unless they have no other 
place of residence at which they will be reported. 

314. When to Make Exceptions. In deciding when to make excep- 
tions to the rules indicated above, consider whether the household or 
persons temporarily residing in your district will be reported at another 
place of residence by some person in a position to supply the information 
required. If the persons orjiousehold will not be so reported, enumerate 
them as residents of your district. 

Enumeration of Special Classes of Persons 

315. You may experience some difficulty in determining wdiether to 
enumerate certain special classes of persons indicated below. In any 
instance in which you are not sure whether to include persons as resi- 
dents of your district, ask your squad leader or supervisor for further 
instructions. 

316. Servants. Enumerate with the household any servants, laborers, 
or other employees who live with the household and sleep in the same 
house or dwelling unit. However, enumerate servants who sleep in 
separate and completely detached dwellings as separate households, 
even though the dwelling is on land owned by members of the household 
by which the servants are employed. 


318. Students at School or College. If there is a school, college, or 
other educ.ational institution in your district that has students from 
outside your district, enuinei-ate as residents of the school only those 
students who have no' usual places of residence elsewhere. Especially 



3S imJiODVCJlOV 

m a university or professional school there ^viJf be a considerable num- 
ber of the older students who are not members of any household located 
elsewhere Find and enumerate all such persons 

319 SchcoUeachers Enumerate teachers in a school or college at 
the place where they live while engaged m teaching, even though they 
may spend the summer vacation at their parents' home or elsewhere 

320 Student Nurgea Enumerate student nurses as residents of the 
hospital, nurses’ home, or other place m which they live while they are 
receiving their traimng 

321 Patienta tn Hospitals, Samtanums, and Conialescent Homes 
Most patients in hospitals, sanituiums, and convalescent homes are 
there temporarily and have some other usual place of residence Enu- 
merate patients as residents of such an institution only if they have no 
other place of residence from which they will be reported A list of 
persons having no permanent homes can usually be obtained from the 
institution records 

322 InmaUa of Prisons, Asylums, and InsUtulions Other thon JIospv 
tals Your district may include a prison, reformatory, or jail, a home 
for orphans for aged, infirm, or needy persons, for blind, deaf, or incura- 
ble persons a soldiers home, an asylum or hospital for the insane or the 
feeble-minded or a similar institution m which the inmates usually 
remain for long periods of time Enumerate all the inmates of such 
institutions at the institutions Note that in the case of jails you must 
enumerate the prisoners there, however short the sentence 


Cevsvs of AanicvLTVRE 
General Information 

Purpose of the Census of AgncuUure An act of Congress provides 
that a census of agriculture be taken every 5 years, for the purpose of 
obtaining basic information on farm acreage, land Values, crops, live- 
stock, and other general items relating to agriculture The Sixteenth 
Census, which will be taken as of Apr 1, 1940 will include compre- 
hensive information on agriculture, including irrigation and drainage of 
farm land 

Every enumerator must fill out a farm and ranch schedule for each 
traci! oi'i'ancf in Ais enumeration cfistnct tfiat might cfassify as a "farm" 
under the census classification, giving all the requested information 
The information should be obtained by a personal visit of the enumer- 
ator It IS absolutely necessary that the census be complete and accu 
rate Census data are widely u«ed by both private and public agencies 
and often form the basis for legislative and administratn e programs 



GA Til BRING BTA TISTICS 


39 


pe farmer should be made to feel that his contribution to the census 
is of real value to himself and to his communitj". 

Census Schedules Are Conjidentinl. The Federal law providing for the 
census prescribes hea\y penalties for revealing information to unauthor- 
ized persons. The enumerator should make it clear, in dealing with 
persons who seem iin\rilling to give the information requested, that he 
is not allowed to give anj^ information to their neighbors or other 
persons; that onlj- sworn census employees *will have access to the farm 
schedules; and that those records for individual farms cannot be used 
for purposes of taxation, regulation, or investigation. 


Defmition of a Farm. The definition of a farm-found on the face of 
the schedule must be carefully studied by the enumerator. Note that 
for tracts of land of 3 acres or more the .$250 limitation for value of 
agricultural products does not applj". Such tracts, however, must have 
had some agricultural operations performed in 1939 or contemplated in 
1940. A schedule must be prepared for each farm, ranch, or other 
establishment that meets the requirements set up in the definition. A 
schedule must be filled out for all tracts of land on which some agri- 
cultural operations were performed in 1939 or are contemplated in 1940 
and which might po.ssibly meet the minimum requirement of a “farm.” 
When in doubt, always make out a schedule. 


You now have in,structions that Avill help enumerate the inter- 
e.sting family first encountered above — the mother will be enu- 
merated (paragraph 321), the baby will not ])e enumerated 
(paragraph 301), the .son will be enumerated (paragraph 318), the 
daughter null not be enumerated (paragraph 319), and the father 
will not be enumerated at the household, although if the jail is in 
town he will there be enumerated (paragraph 322). 

Figures 5 and 6 are photographic reproductions of parts of the 
Farm and Ranch Schedule used for the census of agriculture. On 
Fig. 6 appears the definition of a farm to wliich reference is made 
in the general information quoted above from the manual of 
instructions. Altogether the farm and ranch schedule contains 
232 questions on 16 subjects. The subjects include information 
about the operator, farm acreage, values, farm mortgage and 
taxes, ii’rigation, cooperative selling and purchasing, farm labor, 
farm expenditures in 1939, farm machinery and facilities, live- 



DEPARTMENT OF COMMERCE-BUREAU OF THE CENSUS 







GATHERING STATISTICS 


41 
















42 


I^tRODUC1lO\ 


stock and li\ estock products, crops harvested on the farm in 1939, 
and value of products used and of forest products sold in 1939 

On]> trained enumerators can successfully use such elaborate 
questionnaires as those illustrated, only %\hen properly instructed 
can enumerators know how to get the information requested in 
each que'ition of these complex schedules In some cases, the 
questionnaires, or schedules of questions, are tried out by a 
person-to-person call at the sources of information m advance of 
collecting the data for the final enumeration There was dunng 
the summer of 1939 a tnal census, cov’cnng a sample area in 
Indiana taken by the United States Bureau of the Census while 
formulating the new and more complicated 1940 census schedule 

Statistics Obtained from Samples E\ery twentieth person 
enumerated on the 1940 census was asked supplementary ques- 
tions The results constitute a 5 per cent sample For the 
sample of population, the foUomng subjects were covered the 
usual occupation, industry, and worker class as a supplement to 
information obtained concenung present occupation, m order to 
determine the availability of and shifts to various kinds of labor, 
whether the respondent has a Federal social security account 
number and whether wage deductions ha\ e been made for Federal 
old age msurance dunng the 12 months ending Dec 31, 1939 
data showing the number of children born to women who are or 
have been married (women married, widowed, or dnorced), to 
make studies of differential fertility , mother tongue, or native 
language obtained by a question asking what language was 
spoken in the home m earliest childhood the status of veterans of 
foreign wars and their wiaes, widows, and children, and informa- 
tion concerning the place of birth of the father and the mother of 
the respondents 

This IS the first decennial census in which the sampling process 
has been applied, and the results of the experiment are eagerly 
awaited by statisticians everyw here While the decennial census 
has always been presumably a complete enumeration, other gov 
ernmental statistics have frequently been drawn from samples 
Indeed, because of limited funds, it is necessary for the Bureau of 
Labor Statistics to lesoit to the sampling method to obtain data 
on w ages and hours in industiy Preliminary to the collection of 
such data the census data foi the industry are studied to deter- 
mine m which states the industry is of material importance 



Dom UQJon luTo &Q «lToeliT0 a^rcAmentT 


43 



Tia. 7.- — 1 irsl jioitiou of (luestionnmio vised by the Buroati of Labor Statistics to obtain data on 

vimon v\ iRcs and lioui s 








44 


IMIOIiUClIO\ 


M mufactmei*! diiectoiics ue (“\imiucd and bool s and ponodi 
cals relating to the mdustn arc read thu‘' obtaining the nti \1 
a priori background of knowledge to form the basis for eoimd 
proportional sampling ’ 

To obtain the data on wages and hours of labor, the Bureau of 
Labor Statistics u^cs carefull 3 prepared and elaborate question 
naires, one of which is illustrated in Figs 7 and 8 Trained 
agents obtain the information from a responsible olBcial of each 
local union Each scale of wages and hours is 'verified bi the 
union official interviewed and is further checked by compamou 
wath the written agreements when copies are available For 
example m the building trades vurvej for June 1 1939 inter 
Mews were obtained wath 1 551 union representatives and 2,729 
quotations of «cales were received The union membership 
covered bj these contractual scales of wages and hours was 
approMmatelj 444 000 Great care is €\crci«cd to see that the 
agents arc adcquatelj trained to collect the data written instruc- 
tions are supplied them bj the Bureau of Labor Statistics 
in which they are cautioned as follows 

In the fin il anabsis theaccurac> an 1 value of the entire survey must 
rest upon the vgent« who collect the <lata The e data must be abso 
lately correct and <o presented on the schedule as not to be confusing 
or ambiguous Each agent is therefore requested to study thoroughly 
the instructions not once but repeatedly and to question any point 
therein winch may not be perfectly clear It is extremely important 
that the agent check ev erv schedule carefully before mailing to the office 
to be sure that each item is correctly entered and explained "Wlieii 
tigi cements accompany the schedules the agent must compare each 
quotation with the provi ions in the agreement and must e-vphin uiy 
rlifferenccs 

In order to ensure the collection of comparable data fiom all 
agents the instructions give painstaking definitions of “union 
scale collective agreement ‘apprenticfeb' and ‘ foiemen 

* Por further details of tl c methods employed by the Bureau of Labor 
Statistics see Methods of Procuring and Computing Statistical Infornu 
t on of the Bureau of Labor Statistics (1D23) B Iktin 326 also Umori 
Scale of U ages and Hours in the Building TVades June I 1939 Serial 
R1034 from ^[onthlJ Labor Rmtw November 1939 

* Burea i of Ijibor Statistics Instructtons for S rv cy of Union Seales of 
Wages and Hours 1939 (No 7468) p 1 



GATHERING STATISTICS 


45 



























48 lhTRODUCTIO\ 

"union latcs” and "actifU rates,” "union lates ” and "pre\ ulinff 
rite's/’ and ‘‘a\orages” 

Sliidi/ of I" annhf Income and llxpcndiltifcs In 1*129 the Soeul 
Science Re'carcii Council suggested the advantages of conducting 
a studj of consumption m such a u ay that the sample u ould cover 
a unde range of incomes, all tjpes of natural families, and all 
occupations within representative communities of different 
sizes Income data and certain other facts would be collected 
from all families united, through the use of a short schedule 
These data w ould pro^ ide the basis for selection of an adequate 
number of families in each income class to furnish more careful 
estimates of income and the details of expenditures Following 
these suggestions, the National Resources Committee and the 
Bureau of Home Economics of the Uniteel States Department of 
Agnculturc completed in 1939 a study of family income and 
expenditures Figure 9 shows the questionnaire used ' Tables 
of data based upon this questionnaire arc shown in Chap IV 
It maj be noticed that the tj pc of question and indeed the w hole 
schedule are much less complex, involving much simpler units, 
than any thus far illustrated It was necessary for this schedule 
to be simpler than tho«e discussed above because for the con- 
sumer-income study the agents w ere not so w ell trained as are, for 
example the regularlj emplojed field agents of the Bureau of 
Labor Statistics 

Mailed Q»esUonnaire& In some cases, especiallj where the 
schedule of questions is comparatively simple, questionnaires 
are sent through the mail to the sources of information Such a 
method may be used either where the units involved are very 
simple or where those who are filling out the questionnaires arc 
known to be qualified to do so The United States Bureau of the 
Census and the Bureau of Labor Statistics have been able to 
use this method to obtain certain types of information from 
manufacturing concerns regarding employment, pay rolls, manu- 
facturing output, labor turnover, and the like The method 
appears to he most osed «hei<e Isaiysctaph facts are cuiVected at 
regular intervals Data on pay rolls and emplojunent arc 

‘Bureau of Home Economics, US Department of Agriculture, "Con- 
sumer Purchases Study,' Fart I, Family Income, Mticellaneont Publication 
339 pp 33S-339,e/ National Hesourcea Goinmittee, Conaumc/'i’ jneomes tn 
Iht VnittdSlalts (1938) p 49 



GATHERING STATISTICS 


49 


obtained by mailed questionnaires monthly by the 'Bureau of 
Labor Statistics from representative manufacturing establish- 
ments in 90 manufacturing industries. ^ Figure 10 is an illustra- 
tion of the type of letter used bj^ such agencies to secure the good 
will and cooperation of businessmen. ^ 

Where the questionnaire-by-mail method is used, the returns 
must be carefully edited and subsequent correspondence is 
frequently required to correct mistakes made on the returns. 
Manufacturing and merchandising concerns in this country have 
become trained in the matter of filling out questionnaires for the 
government through years of practice so that there has been built 
up a cooperative enterprise between the government and business 
in the gathering of business statistics. Although sbmetimes feel- 
ing the heavy burden of filling out numerous forms of this type, 
business is nevertheless glad to cooperate because it is eager to 
see each month the compilation of business data that emanates 
from government sources. 

Income-tax returns are of the nature of questionnaires and 
are a source of man}'- important statistics. Everyone is familiar 
with the care necessary in the examination of the units involved; 
everyone who has had to handle a return or listen to the head of 
the family talk about it knows how detailed and specific are the 
printed instructions accompanying each form on which the return 
is made. In the case of the income-tax return, Avhich frequently 
becomes so comphcated as to require legal advice and expert 
accountants, the penalty for failure to file a return is sufficient 
to supply any incentive needed to overcome all obstacles. For 
failure to supply information for the other tj'-pes of questionnaire 
that have been discussed, vdth the exception of the census, there 
is no similar penalty— the business concerns fill out such ques- 
tionnaires in a spirit of public service and to obtain the resulting 
compilations of data. 

Rules for Constructing Questionnaires. Any investigator who 
is tempted to seek information by the questionnaire method 
will be well advised to spend considerable effort first, to make 
certain that the facts are not already available, and then to 

1 Bureau of Labor Statistics, “Employment and Pay Rolls,” Serial R1052, 
November, 1939, pp. 7, 11, and 16. 

^ This letter was used in January, 1940, wdth a new questionnaire revised 
to obtain better monthly data on labor turnover. 



50 


INTRODUCTIOK 



Fio 10 — A typical letter from the Bumu oC Labor Statistics seeViDg to secure 
the good nill and cooperation o/ bostnessmon in the reporting of etntistire 



GATHERING STATISTICS 


51 


investigate ■well the pitfalls of questionnaire making, which is 
a highh’- specialized art. There are six fundamental but simple 
rales to be followed: 

1. The interest of the recipients of the questionnaires must be 
aroused or their cooperation obtained through some means. 
This may be done by engaging the support of some organization 
with which the individual informants are associated. For exam- 
ple, if the questionnaire is to go to bankers,'the support or endorse- 
ment of the American Bankers Association should be enlisted. 
Interest in the questionnaire maj’’ also be aroused by the promise 
to furnish free copies of the summarized information Avhen com- 
piled. In this manner and by the promise of secrecy regarding 
individual returns, Amrious governmental units obtain great 
quantities of statistical infonnation. 

2. The questionnaire should be as short as possible, consistent 
A\-ith the scope of information sought; and the indi\ddual ques- 
tions should be so formulated as to be free of all ambiguity. 
They should be simple. Avoid presenting “problems” that vill 
puzzle the recipients of questionnaires. 

3. Where possible, arrange the individual questions so that 
replies can be brief and unequivocal. "Yes ” or “ no ” or perhaps 
merely a check mark is the ideal answer. 

4. The letter transmitting the questionnaire should be 
brief and dignified and yet should "sell” the idea to the 
informants. 

5. After all is prepared, try out the questionnaire along ■\wth 
the transmittal letter on a dozen or so of the potential question- 
naire recipients in order to make final rerusions before printing 
the questionnaires, or schedules. 

6. Always include a self-addressed stamped return envelope. 

The first five rales apply whether the questionnaire is to be 

used b}’' trained enumerators or to be sent by mail, but special 
care must be exercised if sent by mail. Study of Fig. 9 "uiU 
reveal that answers to all questions are quite simple, in some 
cases merel}' a check mark (see questions VI, 1, 2, 4, and VII), 
in other cases the entry of a familiar numerical item. Less 
highly trained enumerators are required for handling such a 
questionnaire than are required for handling the United States 
census schedules. 



IMRODUCIIOS 


EDITING 

When the questionnaire is reccncd from the agent or from tin 
respondent bj mail it must be ovimmed If anj statement on 
the schedule conflicts mth other statements or if the schedule 
IS incomplete or lacks clearness, it maj have to be returned to 
the agent or respondent for explanation or revision Tins is 
called ‘ editing ’ the returns or the schedules In any case, a 
certain amount of editing must always be done before tabulation 
of the data is begun When trained visiting eimmerafors ha\ e 
been used in the survey, there will of course, be a minimum 
of mistakes When the questionnaires have been filled out by 
the informant directly, it may be necessary to write for further 
information or for corrections because of inadvertent mistakes 
in replies If the respondents ha\e been interested sufTiciently 
to return the questionnaire inth answers filled in, thej will 
probablj be willing to answer further simple questions to eluei 
date their former replies If it is bcheicd that the information 
has been deliberately falsified or withheld, it may i>o necessary 
to discaid the entire schedule or at least the replies m it that 
seem to bo of doubtful tmth 

Editing the schedules is the process of preparing the original 
btatements m the schedule for classification, coding, and tabula- 
tion Careful editing is necessary m order to obtain compilations 
of data that will truly reflect the conditions being m\estigatcd 
One task of editing is to see that all figures entered on the return 
are clear If not, the editor rewrites the figures If so poorly 
written that even the editor cannot read them the schedule 
must be abandoned or the information obtained by further 
correspondence If the editing is done locally, many of these 
difBculties may be eliminated by telephoning 

The principal task m editing is to locate all incomplete, incon- 
sistent, or improbable and impossible answers When such 
answers are found, it is necessary either to discard the defectue 
sciiedules or to obtain correct replies through further inquiry 
This does not, of course imply the elimination of "unexpected” 
replies An incomplete answer, for example, would be if pneu- 
monia is given as the cause of death, it is necessary to know 
whether it is bronchial or lobar pneumonia An inconsistent 
answ er, for example w ould be if a return show s a person wndow ed 



GArilETtlNG STATISTICS 


53 


when from liis age it is clear that he never could have been 
married. If a person who is a male is reported having died of a 
disease that is known to occur only in females, this is an impos- 
sible answer. There is somewhat less distinct a line between 
improbable and simple une.xpected replies. 

Onl}'- after incomplete, inconsistent, or improbable and impos- 
sible replies have been completed or corrected and all unclear 
figures carefull}’- clarified are the schedules ready for coding, 
classification, and tabulation. For elaborate undertakings like 
the census, instructions are printed not only for the guidance 
^of the enumerators but also for the editing and coding of the 
returns. For example, it is pointed out that the examination 
for completeness and consistenc}’^ should be made family by 
family and not line b}" line. It A\'ill be easier to follow the entries 
belonging to the family if a strip of cardboard is placed across 
the schedule just under the line containing the entries for the 
last member of the familj'.^ The coding and editing instructions 
say that all corrections and code figures entered on the schedule 
by the coding clerks should be made mth red ink and a medium- 
point pen (neither a stub nor an extremely fine pen). Such a 
detailed instruction as this is necessary in order to secure uni- 
formitj’’ and rvhen tabulation is undertaken will enormousl}’’ 
facilitate the work of the card-punching operators. 

CODING 

Whether or not machine tabulation is used, the coding of the 
schedules is a measure for economizing time. When large 
amounts of data are involved, consistent classification is enor- 
mousl}’' simplified by the use of code numbers. In arranging 
data it is then necessary only to" observe a code number con- 
spicuously and uniformly placed on the return instead of reading 
a title and remembering to Avhat class that title belongs. On 
a Works Progress Administration project to construct indexes of 
manufacturing employment and pay rolls in the state of New 
Jerse}'-, 1923-1940, it was not possible to obtain the use of tabu- 
lating machines. It was found necessary, nevertheless, to use a 
carefull}’' worked out coding procedure to avoid hopeless con- 
fusion in the handling of the data, which came monthly from 

1 Cf. United States Bureau of the Census, Instruction anuals on Coding, 
-passim. 



54 


INTRODVCHO'f 


6e\eral hundred reporting firms T^hen machine tabulation is 
used, the coding procedure is a neccssarj step, it will be noticed 
that on the schedules (see Figs 1 to S) columns arc inserted for 
the code numbers or letters to represent the vanous tjpes of 
information on the schedule 

An Illustration of Coding In the 1939 census of manufac 
tures, the manufacturing industnes in the United States i\ere 
grouped into 20 groups, each anth a number Food and kindred 
products constitute group 1 , its code number is 100 Lumber 
and timber basic products form group 5, its code number is 500 
Chemicals and allied products are group 9, its code number is 900 , 
All subgroups of industries m the food and kindred products 
classification have code numbers in the lOO’s, for example, 
beverages are numbered m the ISO’s — nonalcoholic beverages 
is 181, malt liquors 182, mnes 184, and so on Grain mill 
products are numbered in the I40’s— flour and other gram rail! 
products IS 141 ceieal preparations 143, nce cleaning and 
polishing 144, and so on Confectionerv and related products 
are numbered m the ITO’s — chocolate and cocoa products js 172, 
chemng gum 173, and so on Similarlj , subgroups of industries 
in the chemicals and allied products classification ha\c code 
numbers m the 900 s, for example, industnahchemical industnes 
are numbered in the 980’s — plastic matenals is 982, explosives 
983, coal-tar products, crude and intermediate, 981, and so on ^ 

The classifications adopted by the United States Bureau of the 
Census for the 1939 census of manufactures follow closely the 
suggestions made by the Technical Subcommittee on Industrial 
Classification composed of representatu es of \anous govern- 
ment agencies * The suggested classification of this subcom- 
mittee, designated the Standard Industrial Classification Code, 
was made according to the following principles * 

1 The classification 'ihould conform to the existing structure 
of American industr) 

‘ 'United States Bureau of the Census, Industry Classifications for the 
Census of Manufactures, 1939," Form 7o 

* Jfembers of the subcommittee included representatives of the Depart 
ment of Labor and Industry of New York State, the Federal Social Securitj 
Board, the Bureau of Internal Revenue the Bureau of Labor Statistics, the 
Bureau of the Census the United States Employment Service, and the 
Central Statistical Board 

’ Central Statistical Board Maj 10 1938 



GATHERING STATIRTICH 


55 


2. The reporting units to be classified are establishments. 
(An establishment is defined as a place of business. All persons 
working at the same location or place of business are classified 
in the same industry.) 

3. Each establishment is to be classified according to its 
major activity. 

4. Each industry group established must have significance 
from the standpoint of the number of establishments and 
employees involved, volume of business, employment and pay- 
roll fluctuations, and other impoi’tant economic features. 

TABULATION 

When the schedules have been edited and coded thej^ arc 
ready for the operations of the card-punch machines, and the 
final machine tabulations are made from these punched cards. 
The information on each schedule is transferred in code to the 
punch cards. With a machine resembling a toy typevTiter, 
operators punch holes or combinations of holes in the cards so 
that the electrically operated machinery for sorting and tabulating 
can automatically transfer the information to totals by any 
classification desired. The punch card somewhat resembles 
the music roll of an old-time player piano, and most of the 
operations through which it goes are mechanical and electrical. 

The 1930 census required the punching of 326,635,219 cards, 
which required an additional handling for verification. These 
cards represented 2,000,000 irounds of paper and Avould make a 
belt reaching nearly tvice around the world -at the equator. 
Punching, tabulating, and related work were equivalent to the 
handling' of 4,701,671,697 cards once. 

The Bureau of the Census has its own unit tabulating equip- 
ment. Some of these machines can digest 400 cards a minute. 
The unit machines were invented and developed within the 
Bureau by Herman Hollerith, Avho was employed in the Bureau 
and invented the first machine to tabulate the 1890 census. 
He is now known as the “father of machine tabulation,” used 
throughout the world b}'’ governments and business to handle 
large statistical jobs. 



CHAl^TFR III 
SOURCES OF STATISTICS 


Pnmary and Secondary Sources The ongiml collector of 
data IS their primarj souicc Gencmllj speaking data obtained 
from a pnmary source inspire greater confidence than the same 
data taken from a secondary source The pnmary source is 
presumably the one sure place to find the e\act definition of the 
umts of obrervation mvoKed Subsequent reproductions of 
the data may fail to reproduce this essential information and 
lead to a misunderstanding of the true meaning of the data 

The United States Bureau of the Census is the primary source 
of population data of cen'uis data in general and of all the 
statistical data published by the United States Department of 
Commerce for the Bureau of the Census is the data gathering 
agency of the Department The Bureau of Foreign and Domestic 
Commerce, on the other hand is a large retailer of statistical 
data gathered not onlj from the records of the Bureau of the 
Census but also from numerous other goaernmental and non 
governmental sources While goiemmental publications aro 
thus not uniformly primary sources, they are usually \ery 
careful to giie exact reference to the primary sources and to 
define units adequately 

In some cases, secondary sources ma> be better than the 
pnmary Such is the case when experts presumably better 
qualified than the general run of statistical re'tearchers have 
selected the good statistics from the poor ones in some pnmary 
source that may be either obscure and difficult to obtain or of a 
highly (echmeal nature Occasionally a secondary source 
performs the valuable function of selecting data impartially 
from piiniaiy sources that are biased in one way or another 
Sometimes it is necessary also to be on guaid against bias m 
government ‘-oiirces * 

* Hixaicxs A r Statistical Bias m Pnmarj Data and Public Policy 
Joirnalo/Ue imerican Slaltfitcal Association Vol 33(1938) pp 143 152 



SOURCES OF STATISTICS 


57 


Natural Sciences. iUter the development of the sfatistical 
theories of ga.ses (Charles’s law, Boyle’s law, Avogadro's law, 
the work of Ga3’'-Lussac, and the like) the phj'^sical sciences and 
arts accumulated source materials of a statistical character. 
Beginning with the last quarter of the nineteenth centurj^, 
biology and zoology also accumulated source materials of a 
statistical character when a group of English biologists concluded 
that mass observation was necessarj"- for the successful solution 
of their problems.^ 

Nongovermnenlal Sources. Statistical data of the natural 
sciences consist to a large extent of hj’^pothetico-obsen’ational 
or experimental data. The principal sources of these data are 
handbooks of the special fields of studj*^ and monographs written 
b}^ scholare at the great centers of research. For example, 
sources of astronomical data are the obsen’^atories located in 
various places throughout the world. The sources of cuiTently 
discovered data in biology, phj’-sics, and chemistry are the 
laboratories maintained b3’’ universities, b3' private business 
enterprisers, or b3’’ such institutions as the Smithsonian Institu- 
tion at Washington, D.C. 

Additional primar3’’ sources of statistics in the natural sciences 
are the several hundred technical journals, publications of the 
leai-ned societies, trade journals, publications of commercial 
research organizations,* college bulletins, and the publications of 
endowed research enterprises. Fortunately for those who desire 
to make use of them, the data current^ accumulated in such 
sources are summarized or abstracted in publications that main- 
tain sections of their respective i.s.sues for the purpose." 

Statistical data for the natural sciences are also found in 
handbooks for the numerous special fields of stud3^ For example, 
there are handbooks in medical entomolog3’^, ph3’^sical therapy, 
geolog3^, botan3% experimental physics, and geophysics.^ 

‘ Andeksox, 0. N., “Statistical Method,’’ Encyclopaedia of the Social 
Sciences. 

^ A partial list of such abstracting agencies is as follows: Science Abstracts, 
Abstracts of Geology, Abstracts of Bacteriology, Abstracts of Chemical 
Papers, Zeniralblatt fur Malhemalik, Jahrbuch liber die Fortschritle der 
Malhemalik, Physikalische Berichte, and Biomelrika. 

=> Handbook of Physical Therapy (1939); Ilandbuch der allgemeinen Cheinie, 
unter Milwirkung vieler Fachlente (1918-1937); Handbuch der Experimenfal- 
'physik (1926-1935), 43 vols.; Handbook for Chemistry and Physics. 



58 


INTRODUCTIOS 


Governmental Soxirces Sources of data in the natural sciences 
are enormously supplemented bj go-vernmental agencies The 
government weather bureau supplies current and histoncal 
data important to man^ kinds of rcsearcli in such natural sciences 
as botanj, zoology and geologj Tlie Mxnerals \eaThooh, 
published bj the United States Department of the Interior is a 
source of data for natural seientista The Geological Suncy 
IS a source not onlj of geological data but of dat i on electrical 
power production and other information useful to engineers 
Engineers also find that government agencies are sources of 
statistics on railroads flood control roads and other similar 
subjects hav ing to do wath construction 

Biologists find the chief source of modern vitality statistics 
of all sorts among the publications of governmental agencies 
An important source of statistical data for medical men results 
from medical research recorded in the files of hospitals, some of 
which are govemmentallj operated 

The quantity of statistical data relating directiv to the natural 
sciences is thus Urge but the natural sciences m addition make 
extensive use of the highlj organized mass of statistical data 
collected largelj b> social scientists Scholars m the natural 
sciences frequently make use of statistics concerning social and 
economic events It is not at all uncommon for data concerning 
the behavior of human beings to enter ibto the calculations of 
engineers phjsicists and chemists engaged in practical business 
enterprise or pure research Some illustrations were given in 
Chap I 

Social Sciences Genesis ofStaUxltcal Sources The increasing 
complexitj of economic and social life has furnished the motive 
for the systematic marshaling of statistical data about human 
society and in addition tlie dj-namie quahtj of modern life 
makes it necessary to repeat statistical enumei ation freqiienti) in 
order to have knowledge of current facts and what may be 
more important, knowledge of change In the static conditions 
of earlier times one public fact-gathenng enterprise could serv e 
for 5 ears as a basis for judgments and for political and social 
action Under modem dynamic conditions this is not the case 

In a democracy the timing of governmental action is dependent 
on the consent of the people, and that requires widespread 
knowledge of manj economic and social facts and their inter- 



SOURCES OF STATISTICS 


'59 


pretation. If democracy is to presenm its high standards of 
achievement, its powers of expression in the face of tremendous 
forces that appeal to sentiment rather than to reasoned judgment, 
adequate factual information must be in the hands of the voters 
and of their governmental administrators and representatives 
in time for necessarj’- action. Modern- business enterprisers, 
too, faced with rapidly changing conditions, must lean more 
and more heavily on statistics to point the way to the solution 
of their problems. 

During a great national crisis, such as a severe depression or a 
Avar, the value of statistical data is enormousty enhanced. In 
depression periods, published statistical data from governmental 
sources, Avhich in retrospect appear to have been but a trickle in 
prosperity, sAvell to flood proportions. Modern Avar, moreover, 
as Avell as being a “Avar of suppty,” “a war of machines,” or a “ Avar 
of production,” is a “AAnr of statistics.” The fact that much 
of the increasing Avartime volume of statistical data is confidential 
explains the apparent and deceptive appearance of fewer sta- 
tistics in AA^artime than in peacetime. During the Second World 
War the statistics published by the United States Bureau of the 
Census, for example, sharplj- decreased because its organization 
and equipment Avere almost fully employed doing Avartime sta- 
tistical Avork, especially for such agencies as the War Production 
Board and the Office of Price Administration. 

So diligent haAm been the efforts to obtain current knoAAdedge by 
means of statistics during the past fifty jmars that a A'^ast source of 
raAV material noAV exists, coAmring many fields of knoAA'ledge. 
Elementaiy acquaintance Avith these soiu'ces is essential to all 
those AAdio hope to Avork in either the natural or the social sciences. 
Complete familiarity A\dth sources of statistics can come onty A\dth 
long practice in their use. It Avould be futile to attempt to impart 
to the student this desirable familiarity by ghdng a complete 
description of all sources. 

The Pattern of Statistical Sources. The student cannot hope to 
memorize the names of all sources of statistics; indeed, the 
attempt Avould not be useful, for the names change and neAv ones 
are added as time goes by. Comprehension of the pattern of 
development of statistical sources, hoAA'ever, aauU enable the stu- 
dent to become a scholar Avho, AA'hen confronted bj^ a statistical 
problem, AA'ill have acquired a “statistical sense” that aauII guide 



00 ' 


J\1R0DVCTI0\ 


him to the appropriate sources This presumption explains uhy 
the present chapter on sources is gi^cn an historical or a genetic 
setting Let the names of all the statistical agencies be changed 
b\ the Second orld ar and the study of the historical and 
genetic explanation of statistical sources unll still help the student 
acquire that scholarly ability required to locate sources he will 
ha\e historical perspectne to facilitate his prompt understanding 
of the postwar \ orld of statistical sources In any case the 
period between the First and the Second World War will long 
continue to be one mtensixely studied by statisticians of coming 
generations 

In the ensuing description of 'sources of statistics which is 
presented in its histoncaJ or genetic aspects governmental 
sources are given more space than nongovernmental sources 
because the general statistician deals mostly with the former 

hile the specialized statLstician must acquire detailed know ledge 
of sources m his special held he also needs to be familiar with 
governmental sources m his field Governmental sources more 
over are themselves one of the best guides to the successful use of 
nongovernmental source^ because many governmental agencies 
are secondary souiccs that give complete and very useful desenp 
tions of the pnmary sources used 

The motive imderlyung the gathering and publication of 
statistics by private enterprise has usually been the profit 
available through the sale of such statistical information to 
commercial banking and manufacturing or distributing enter 
prises In many instances these services emerged as incidental 
features of existing publications an example is the increasing 
amount of statistics of all kmds published m newspapers and 
periodicals In other instances the statistical feature was the 
original purpose of the publication many trade journals are 
cases in point 

The state and privately endowed universities of the nation arc 
important 'jources of statistical research especially of a pioneering 
character in all branches of knowledge — some being famous for 
ceitain fields of statistical woik 

Dining recent vtai-s one of the most striking aspects of big 
busmens devoloiiment has lieeii the maintenance of icsearch 
orgamz itions coiitriliut mg to st itistical knowledge a fact that tlio 
public was not permitted to forget as it vi'^ited the 1939 ^\olld s 



SOURCES OF STATISTICS 


G1 


Fair in New York and I'ead newspaper and magazine wartime 
advertisements in the early 1940’s. Some eorpoi'ation-financed 
research organizations, primarily intended for profit making, have 
incidental!}^ contributed in important ways to the advancement of 
scientific statistics in engineering, business, and the use of agri- 
cultural products. I\'Iost of the pioneering statistical research 
in agriculture, however, as well as in labor organization, wages, 
and the like, is done by governmental units or by the govern- 
mentally sponsored agricultural experiment stations connected 
with various state colleges or universities. 

The motive underl}’ing governmental activity in the collection 
and publication of statistics has been to increase knowledge of 
facts so that administrators may adjust government action to the 
changing needs of a dynamic society, so that democratic repre- 
sentatives of the people may legislate more e.xpeditiou.riy and 
Avisely, and so that the voters in a democracy may ha^m the 
opportunity to know the facts. In i-ecent years, a great expan- 
sion in the- gOAmrnmenfal activity of collecting and publishing 
statistics to aid busine.ss enterprise has occurred. In short, 
governmental statistical agencies assist both public and private 
economic planning. The large quantities of statistical informa- 
tion released by the Department of Labor and Department of 
Commerce are eagerly awaited by business enterprisers seeking to 
keep up to date in their methods, labor policies, coverage of 
potential markets, and knowledge of desirable sources of raw 
materials. True, their zeal in filling out the questionnaires that 
constitute the sources of the desired statistics sometimes falters, 
but on the Avhole businessmen recognize the truly cooperative 
character of the system of collecting and disseminating business 
statistics and stoically endure the ban-age. 

.•Vs a consequence of the manner of their historical and genetic 
origin, therefore, modem statistical sources in the United States 
fit into a pattern that is more or le.ss unifoi-m among the various 
fields of knoAvledge. This pattern is roughly as foUoAvs: 

llesearch of private enterprisers: 

Individual enterprisers. Special monographs, articles, and other con- 
tributions are made by individuals and published under the sponsor- 
ship of universities, professional publications, and the like. 

Research associations. Quantities of statistical data arc collected by 
research organizations, some of Avhich are hired by corporate or 



62 


INTRODUCTION' 


noncorporate ‘pmate entcrpnse in tho bu'iimss world, uoino ion- 
nectcd with universities, and some independent^ endow rd 
Commercial sources t e , pn\itel> finmred publications 

These sources arc in the business of coUectmi; and publishing statistics 
BS a profit-making enterpnse the> include 
Trade journals 

Commercial and financial periodicaU •««....<■ 

Official publications of the Kovcminent 
Federal or national governmental agencies 
Local t e state or municipal governmental sRencies 

Guides to Sources of Statistics If a trained pro’fe'i/^ional 
librarian is available for consultation, he is the best informant on 
the subject of guides or handbooks to all general fields of research 
However extensuc may be the cxpencncc and training of the 
research scholar, he finds himself continually rcljing upon the 
local librarian, who makes a specnlt^ of keeping posted on new 
developments ^vlth respect to handbooks nnd litemrv nudes of all 
kinds 

Guides io Nongovemmenfat Slatislus FraHicallj c\cry con- 
cei\ablo occurrence lu the world of man or beast, in the heavens 
on the ground, under the ground, on the tea, under the sea, or in 
astronomical space holds an interest foi some individual or group 
of individuals, either as a hobb> or ns a means of li\ chhood some 
individual or group of indniduals is now and has for man\ tears 
been collecting statistical facts about all these w orld c\ ents The 

existing sources of statistics necessarily therefore appear at first 
glance to be an unwieldy mass, but, fortunately both for begin- 
ners and for practiced scholars, this mass has been for some time 
culled over and classified, indexed and croas-mdexed by \ arious 
t j pes of handbooks, y earhooks, or guides of one sort or anothei 
The general magazine indexes constitute one class of such 
guides, the principal ones arc as follows 

AgncuUural Index 

Education Index 

Engineering Index 

-IndiLitr.inl.Ar.ti .Index 

Public Affairs Information Serttce 

Readers’ Guide to Periodical Literature 

Such indexes or guides are compiled monthly and cumulated 
into annual volumes, and articles of a statistical character appear- 



SOURCES OF STATISTICS 


63 


ing in a comprehensive variety of journals and trade magazines 
can be discovered by the intelligent use of these alphabetically 
arranged indexes. The above-listed indexes are not specifically 
organized as guides to statistical sources; their collective purpose 
is as broad in scope as all modern knowledge, but one of their 
varied uses is to serve as guides to sources of statistics. 

Indexes or handbooks specificall}' dedicated to serve as guides 
to sources of statistics do, however, exist in considerable number. 
In 1937 the United States Department of Commerce published 
“Sources of Current Trade Statistics” (Market Research Series 
13), which lists practically all current trade statistics by govern- 
mental and nongovernmental agencies; this handbook was 
designed for the use of manufacturers, distributors, financial 
institutions, advertising agencies, trade associations, bureaus of 
business research, and individuals engaged in research work. In 
1942 the United States Department of Commerce published a 
handbook entitled Trade and Professional Associations of the 
United Stales, which lists the sources of practically every conceiva- 
ble type of trade statistics compiled ly nongovernmental agencies. 

In 1934 a scholarly attempt was made by Gerlof ^^erweJ' and 
D. C. Renooy to construet a manual of statistical sources under 
the title The Economist’s Handbook; this book was published in 
Amsterdam, Holland, and a supplement appeared in 1937. It is 
a guide to statistical sources on economic subjects, covering 
Belgium, France, Germany, the Netherlands, Switzerland, the 
United Kingdom, and the United States. In the United States, 
D. H. Davenport and F. V. Scott were authors in 1937 of An Index 
to Business Indexes, a book containing information about the 
many indexes used in business, including the name of the compiler, 
description of the index, frequency of publication, period covered, 
and the name of the publication in which current data appear. 
In 1937 the Special Libraries Association published a handbook 
Guides to Business Facts and Figures in which Part III is an index 
of statistical sources of information. 

A multiple assortment of handbooks in various special fields 
serve as guides to statistics in each special field of knowledge, 
along with other purposes for which the handbooks are issued. 
For example, Management Handbook, Flitcrafl, and Handbook 
of -Accountants serve, in their respective fields, as guides to 
statistical sources. 



I\IHODUCTIOS 


(>t 


Often the purpose of a handbook or jndc\ of sources of sta- 
tistics IS sened b} one of the numerous abstracts of statistical 
data The Slahslical Abstract of the United States, published 
annualh b^ the XJnit«l States Department of Commerce, is 
itself a source of statistics, but it is also an index to sources 
because at the head of or in a footnote to each tabic of data it 
records the pnmarj source from which the data are obtained 
Similarlj, the World Almanac, which for 58 years has been pub- 
lished bj theiVety York World or the York World-Telegram, is 
itself a source of statistics and also a guide to sources, for the same 
reason 

Guides to Goicrnmental Siatistie^ Manj of the handbooks 
sening as guides to statistical sources compiled b 3 nongo%ern- 
mental agencies include also m thoir alphabetical indexes a large 
range of go\eminental statistical sources as well, but there are a 
number of important handbooks specificallj intended to sei^ c as 
guides to the mare of governmental sources of statistics The 
hc't known and most comprehensive guide is the United Slates 
Goiemment Manual published bj the government In 1938 the 
Ccntial Statistical Board (later the Div ision of Statistical Stand- 
ards of the Bureau of the Budget) published a Directory of I ederal 
Slattslteal Agenaes The Central Statistical Board wasorgamsed 
in 1933 in order to find some means of coordinating the varioub 
tv pes of Federal statistics ‘ The business of the Central Sta 
Jistical Board was to serve as an agencj for the reorgamzation m 
collection, tabulation, and use of Federal statistics It w as hoped 
such an agency could help solve the problem of overlapping m 
statistical function, which caused unnecessarj burdens upon 
respondents to questionnaires and which al<!o re'^ulted in incffi 
ciency in thfe utilization of statistical information 

In response to a request bj the President m a letter of Maj 16, 
1938, the Central Statistical Board made a report on the question 
as to w hether or not it is possible to reduce the amount of duplica- 
tion in statistical reports The board concluded that much 
could be done in the waj of coordinating the gathering, tahula- 

' In Uie task of perfecting Federx! statistics the government his received 
tlie ad\ice of scientific professi<ipal associations See American Statistical 
\s5ioci ition and the Soenl Sciemc Rescarth Couniil, GoternmeTii SiaUstics 
I HepoH of the Coi imillee on Garmnent Slaltatics and Inforniahon Serptces 
(1937) 



SOURCEfi OF STATISTICS G5 

lion, and pi'csentation of Pederal statistics; by such coordination, 
comparability in definition Avould bring about a great improA'c- 
ment in the efficiency of data collected. With reference to the 
reduction in the amount of duplication, hoAA'eA’’er, the board 
concluded that a majority of the financial and other statistical 
reports and returns made bj'- the public to the Federal govern- 
ment are incidental to the administration of governmental 
functions ; the statistics are a by-product of either administratiA^e 
or control functions of the government. Consequently, the 
board recommended that the Federal statistical and reporting 
services should remain largely decentralized so that they may be 
associated AAath the respective governmental functions to AAffiich 
most of them specificalty relate; but that there is a continuing 
need for a statistical coordinating agency, Avith a specially 
trained staff and AAdth broad poAvers.^ One important result 
of the coordinating functions of the Central Statistical Board 
AA^as the publication of a directory of federal statistical agencies, 
AAffiich has already been mentioned. 

A general guide to government publications, Anne Morris 
Boyd’s United States Government Publications (1941), serves 
incidentally as a guide to governmental sources of statistics. 
This book also gives an analytical picture of the character and 
scope of government publications. The same ma 5 '’ be said 
regarding Laurence F. Schmeckebier’s Government Publications 
and Their Use (1939). 

RESEARCH OF PRIVATE ENTERPRISERS 

Individual Enterprise. Pioneers. In spite of the fact that 
Domesday Book A\as an eleventh-century product and that CA'^en 
earlier examples of governmental collection of statistics can be 
cited, it remains true both historically and currently that the 
pioneer AA'^ork of conA^erting public records into statistics is non- 
governmental. The pioneers have been and are indhdduals. 
The fajffier of model’ll vital statistics is John_(^ujit, aaLo in the 
sewnteenth century made statistical investigations that served 
as the basis for founding life insurance. Another seventeenth- 
century scholar. Sir William Petty, Avas the outstanding pioneer 
in developing statistics for the social sciences. Both these 

1 Report of the Central Statistical Board, 76th Congress, 1st Session, 
House Document 27, Jan. 10, 1939. 



6C J\TRODUCTIO\ 

men \\ere fi‘?soci‘\ted with the earl} dc\ elopment of the Royal 
Soaely of London nhich was ineorpomted in ir62 and is the 
oldest of modem learned societies 

Roneering in tl e art a-j tiell as the science of statistics con 
tmues in modern times to be highlj mdi\ idualistic This is 
exemplified h\ the Mork of Karl Pearson in England' and m the 
L mted States bx such men as W e«le\ C Mitchell and his uoiks 
on index numbers and the busincs-, cxcle \tarren Persons and 
his work on the statistical anahsu, of business stati tics and 
man} others * Indix idual contributions are commonl} presented 
m the publications of learned societies such as theJoimal of the 
Royal iStatisticol Soeir/y the Jounal of the Statistical Society of 
ionrfort (founded in IS3-1) and theyowrnof o/tAc Iwmcan 
tjcal 4asociatiou (loiinde<l in IS39) These and the pubbcations 
of other learnetl oeietie^ are indexed in the guides mentioned 
earlier in this chapter 

Kesearch Associaboas Duiing the 1920 s and 1930 s a num 
her of important re earrh organizations in the field of economics 
and social institutions were organized The Brookings Institu 
t ion in 11 ashmgton D C the liar, an! Committee on Economic 
Research the'National Indu«tnalConference Board theNational 
Bureau of Economic Re earch and the Cowles Commission 
were among the mo«t prominent 
The Ilarx ard Committee on Economic Research w as orgamzed 
in 1919 to studx business trends nnd cxdes and to publish a 
scientific busmc's forecaster its work was launched under the 
leadership of Warren Persons In addition to the forecasting 
ser\ ice this research organization publishes the J?encto of Eco~ 
nomic Statistics (quarterh ) and once or twice a } ear a summarx 
of statistics called the Statishcal Record 
The National Industrial Conference Board was organized b} a 
group of comparatixel} public spintetl manufacturers to stud} 
the X anous problems of emplox er cmplo} ee relatiomJups loading 
them into special studies of real wages income distribution and 
gCTTCftrf ecorroxnic coatfrfrons ti pubfisfies its studios m the 
form of books appearing as the} arc xxTitten In addition to 
the subjects mentioned aboxe there haxe been National Indus 
tnal Conference Board books on cost of lixnng statistics of income 
' See Chxps \III and 
* See Cl aps \I\ and W 



SOURCES OF STATISTICS 


67 


by states, and availability of bank credit. The National Indus- 
trial Conference Board has also published since 1940 Tlie Eco- 
nomic Almanac, which is a -vridely used annual. 

The National Bureau of Economic Research was founded in 
1920, sponsored by a group who believed that a purelj^ dis- 
interested approach is desirable and that no group should control 
the findings of this new statistical organization. It is so con- 
stituted as to produce this desirable result. A number of special 
studies of economic and social conditions have been made and 
published under its auspices and some in cooperation Arith the 
government. For several 3 mars it has occasionallj’" issued 
bulletins containing data resulting from studies that usualty 
appear later in more detail in book form. 

The nature and accomplishments of the National Bureau of 
Economic Research are indicated bj"^ the following quotation from 
the tAventieth annual report of the director of research:' 

The National Bureau was established by men Avho believed that it is 
becoming possible to apply quantitative methods to the study of eco- 
nomic behavior. Thej'^ realized that this field is far more difficult than 
the fields in which science has Avon its major triumphs and demonstrated 
its practical usefulness most conclusiA^ely. Also they recognized that 
inA’-estigators cannot exi)eriment at aa'III upon society; though society 
can and does experiment loosely upon itself. . . . Economics was not 
likely to gi'OAV faster at this turning point in its career than its elder 
sisters [the natural sciences]. But at the close of the First World War 
the materials for observing actual behaAUor Avere multiplying So rapidly 
and analytic methods of extracting significant conclusions AA'ere becoming 
so versatile and poAA^erful that our founders thought their staff had good 
prospects of rendering valuable service at once. Also they hoped that 
one modest success AA’^ould lead to others, fostering cumulative groAvth 
of the kind that has characterized systematic research in other fields. . . . 

Twenty years of effort along the lines laid down in 1920 haA'^e con- 
firmed our faith in the social value of what the National Bureau set out 
to do. Our accomplishments have not been spectacular, but they have 
been substantial, and they afford a secure foundation on Avhich to build 
in future. We have more reason than CA^er to believe that in trying to 
establish a few economic fundamentals firmly we are aiding thoughtful 
men of all persuasions to plan wisely. If tested knoAvledge is the safest 
and surest guide in practical affairs, our Avork has social meaning, how- 

1 Mitchell, Wesley C., “The National Bureau’s Social Function,” 
March, 1940, pp. 13-15, 19. 



G8 l\lHODUCTW\ 

ever technical its character We hold that advance ^ill be rapid 
and continuous m proportion as the workings of our economic system 
arc understood In trying to replace epeculative opinions about eco 
nomic relations by conclusions resting upon evidence no are erpediting 
progress m the most effective manner we know 

Another device, peculiir to the hational Bureau, is to select 
directors who have divergent views on public policy and give each an 
opiwrtumty to cnticiic every manuscript That device has been of 
inestimable help to us in keeping our reports nonpartisan and therefore 
worthy of credence by the public Having such a board we cannot 
expect unanimous consent from its members to many policies that 
individuals among us favor But the mere fict that the National 
Bureau never takes sides upon controversial issues adds its bit of pro 
tcction against bias in our publications and helps toward meriting and 
winning public confidence 

The more thoughtful <!ectrons of the public we are now reaching in 
various wajs Physical scientists are coming to recognize the con- 
tributions of research in economics for example, in 1 Believe Robert A 
Milhkan eaj s 

In economics and the social sciences long and elaborate statistical 
studies must be made in order to eliminate the disturbing factors and 
thus obtain the controlled conditions We are just beginning to have 
available through the National Bureau of Economic Research and other 
iimilar agencies a large amount of such definite, dependable, statistical 
knowledge m economics 

The l^wentitth Century rurwl is another research association 
organized to function in a manner similar to that of the National 
Bureau of Economic Research It publishes occasional pam- 
phlets or books 

THE COMMERCIAt SOURCES 

In addition to the numerous sources of statistics resulting from 
mdivndual or group research such as thoco described above a 
great quantitj of statistical sources has como into existence as 
the result of the activTties of those who go into the business for 
the profit of collecting and sclbug btafistical data Such are the 
trade journals and the commercial and financial periodicals 

Trade Journals A 1 irgt number of trade journals are actively 
engaged m collecting statistical data for various tjpes of enter 
pnve The Iroii Age, for example, founded in IBS'S, is the trade 



SOURCES OF STATISTICS 


69 


journal for the iron and steel industry, i^ublishing statistics on 
iron and steel production in all states and the prices of iron, steel, 
copper, zinc, etc. • Another example is Wikman's Brazilian 
Rcvieio, -which is the trade journal for coffee. The trade journals 
are frequently used by governmental statistical organizations, 
such as the Bureau of Labor Statistics, the Department of 
Commerce, and the Board of Governors of the Federal Reserve 
System, a.s the primary sources of particular data assembled 
by them. Occasionally the trade journals vail publish in special 
pamphlet form or in books a.ssembled data of the trade. 

■ During the 1920’s, a large expansion in the collection and 
publication of statistics in various lines of economic activity on 
pln'sical commodity production and distribution took place. 
In a few instances this ;vork wa.s done by private companies. 
Thus Seidman and Seidman compiled data on furniture for the 
Grand Rapids district, and R. L. Polk and Company compiled 
data on new cars registered ; the function of the latter was .subse- 
quent!}* taken over by Il'urrf’s Anlomolive Reports. Many 
such series were compiled by the trade journals from public 
records. The Iron Age compiled data on phy.sical quantities 
of production of pig iron, and the Statistical Sugar Trade Journal 
published quantitative sugar statistics. 

Trade Associations. Most of the production and distribution 
series are compiled by the various trade associations, .such 
as the American Face Brick Association (merged with the 
Structural Clay Products Institute), the American Paper and 
Pulp .fVssociation, and the United States Cane Sugar Refiners’ 
Association. 

The production and distribution stati.stical scries arc of various 
types. Some mea.sure the flow of commodities through the 
proce.ss of production and distribution, for example, data on 
raw material received or consumed, like the figures on cotton 
consumption by textile mills or on cattle receipts at stockyards. 
Others give a measurement of quantity or stock of a commodity 
on hand. Still others are figures on the amount of orders or 
sales of the product, such as the unfilled orders of the United 
States Steel Corporation. As noted elsewhere, many of these 
series are collected from their original sources and published 
by the United States Department of Commerce in the Survey 
of Current Business. Consequently, the appendix of the Survey 



70 


I\TRODUCHO\ 


contains a de cnption of about e\crj important commercial 
«ource of ‘statistics In fact the Department of Commerce 
publishes a description of such statistical sources > Frequentlj 
a trade as ociation udl publish a sort of handbook or abstract 
of statistics for the trade coaenng historical as uell as current 
statistics * 

Commercial and Financial Pubheahons The commercial 
and financial journals and scraices are also too numerous to 
mention m detail but a fea\ maj be described as tjpical Among 
these are the Commercial and Financial Chronicle («ecUj) tli(j_ 
Actr 1 otK Journal of Commerce (dailj) the Wall Street Journal^ 
(daih) Bradstreets (merged m 1933 «ith Dims) Babsons 
Reports Moody s Imejtors Semcc Standard ct Poor s Corpora 
tion Broolmire Economic Sen ice and the Dodge Stalistical 
Service 

\\hile there is much overlapping of published commercial 
and financial statistics through these various publications and 
semces nev'crthclcss each has become noted for espectallv good 
statistical ervice m a particular line For e\ample the user of 
business failure statistics thinks first of Bradstreet b because for 
manv viars the data that it has published on business failures 
have been vndelv u«ed Brad treets was also famous for its 
inde\ of wholesale prices for the United States being a pioneer 
m the development and publication of such an mdev Babson s 
and Brookmtres services are notcvl foi business foiecasting and 
for m\ c tment sen ices and forecasting the stock market The 
A cw 1 ork, Journal of Commerce is noted for its cuiTent data on 
new securities issued and on the produce markets The New 
iorK Times is noted for its mde\ of business activity which 
was published m the Annalist (wceUj) until that periodical 
was di continued The Commeraat and Financial C^ronicfe is 
particularlj useful for its detailed arraj of current data on bank 
clearings busmens failures interest rates stock and bond prices, 
corporations cipital stock and bond issues and the money 
TO.3.cket=. of the world Thvs TCinarkahlo pwbhcatum 

‘ Sources of Currei t Trade Statist cs tforAel ffesearct 6trie« 13 (1937) 

* fnitcd States Cane Sugsr Refiners Association Sugar Econon m Sla 
list cs and Doc menis (193S) 

* Often referred to m footnotes aa Z)ot JontsaidCo ipan_y whchsonic 
t I es invstifips beginners 



,'<orjiOj:s of stati.stjcs 


71 


fraccul in its linoaKo hack 1o IS20, when if fifarted as NihW 
Week!)/ famous as an cjirly preacher of tlio doctrines of 

high farilTs and Uie '‘Atnenean system.'' From 1839 to 1SG5 it 
was called Ifvnt’s: MncJiaut’a Mannzinr. Since 1S65 it has 
gone under its present name. The financial statements ‘of all 
kinds of eor{)orations, together with other statistics and corporate 
historie.s. are to he found in Moody ’.s .^fnniial of Corporations. 

The Cotnnurlihf Yuirhnok i-, published by the Commodity 
He.-^earch Jiureau. New York, X.Y. 'I'his is a private organiza- 
tion devoted to the dis-'eminalion of accurate information on 
commodities atul other redated .subjects, including production, 
consumption, jwices, stocks, imports, exports, etc. Some are 
annual, some are monthl.v data. 

Ail the abo^•e-deseribed .-^ouree-s are extensively used by 
American and f»iri-ign bu.-ine-s enfer|)risers, whose .subscriptions 
to them and advertising in them make possible the vast statistical 
undert.akings on a [jrotit.able basis. The fact that they are .so 
support eil woidfl seem to i)rove the value of statistics to modem 
businc-'-s enterpri''e. 

OFFICIAL PUBLICATIONS OF THE GOVERNMENT 

Federal Statistical Agencies. Department of Commerce. The 
Department of Commerce i.s one of the greattsst fact-gathering 
organizations of the Federal government, if not the greatest. 
It contains a number of bureaus chiefly engaged in the dissemina- 
tion. of faet.^ (amcerning not only commerce but economic and 
social life in general. The Dureau of the Census i.s the fact- 
gathering agency of the Department. 

The Articles of ConfedeF.ilion provided for the taking of a 
triennial census, but the Constitution of the United Statas 
provides for the taking of a population census every 10 years, 
to .serve a.s the basis for Congressional aiiportionmcnt. The 
first one was taktm in 1790. The broad practical and .scientific 
purposes that the cen.sus today seiwes were not. in the minds of 
the American founders, and the earlier census publications were 
meager alTairs compared with the modern census.' The census 
of 1790, for example, returned the number of free white males 
over, and the number under, si.xtcen yeam of age, the number of 

‘ CoM.MiN'fiH, .Jou.v. “.Statist teal Work of tlic Federal Government of the 
United St.'itcs,” in Korea. Jlixtori/ of StnIiMics, pp. G70-G/2. 


72 


IMItODl C1I0V 


free white females without distiiirtioii bj ige, all other free 
persons, and slaves — without, in the case of the last two classes, 
distinction b\ either fic\ or age The published census of 1790 
consisted of a v olume of 52 pages At the census of 1800 and of 
1 810, fiv e age classes vv ere distinguished and the age classification 
was extended to white females In addition, at the census of 
1810, some facts were compiled relating to manufacturing estab- 
lishments, their number, nature, extent, situation, and value 
A digest of the results of tlie»e data was prepared by Tench 
Co\e and published m 233 pages The census of 1820 introduced 
the idea of collecting occupational statistics, calling for enumera 
tions of persons engaged in agriculture commerce, and manu 
facturea The census of 1830 returned to the onginal idea of 
obtaining merely a population enumeration, but m 1838 Presi- 
dent \an Buren suggested to Congress m his annual message 
that the census should be extended so as to include ‘authentic 
statistical returns of the great intciests specially entrusted to or 
necessarily effected by the legislation of Congress As a 
result, Congress provided in the act for the Sixth Census (1840) 
that the marshals should “return m statistical tables all 
such information in relation to mines agriculture, commerce, 
manufactures, and schools, as mil exhibit a full view of the 
pursuits mdustrx, education and resources of the country 
Congiess overreached the capacitj of those entrusted with the 
task of census taking, for the census of 1840 is famous foi its 
inaccuracies At the census of 1850, improvements m the 
organization of collecting and compiling the statistics were 
made, and, according to Cummings, with the census of IS'iO 
the decennial enumeration began to assume modem proportions 
and character 

One of the outstanding American economists of the nineteenth 
centui^, Francis A Walker, was a pioneer in dev'elopmg the 
census to what we understand it to be now He did particularl 3 
notable work m perfecting the orgamzation and presentation 
of statistical data in the Tenth Census (1880), of which he had 
charge 

At the Eleventh Census (1890), machine tabulation was 
introduced (the Hollerith tabulating machines), at a great 

1 Ibid p 672 

'/fiirf pp 672-'fi7S 



tiOURCEH OF STATISTICS 


73 


saving of time and expense. The printed reports of the census 
of 1890 aggregated 21,410 pages, in 25 quarto volumes, the final 
report being issued in 1897. The Bureau of the Census was 
established as a permanent one in 1902 and since that time has 
been in continuous operation as a gi-eat fact-gathering organiza- 
tion for the national government. The tendencj^ since that 
time has been to confine the decennial census to the major 
subjects of population, manufacturing, agriculture, mines, and 
quarries, and in intervening jmars to take censuses of business. 
In intercensus j^ears the Bureau also has charge of the annual 
collection of mortalitj’- data, statistics on religious bodies, the 
collection and compilation of statistics of cotton and tobacco, 
and the annual compilation of .statistics of cities of 30,000 popula- 
tion and over, and financial statistics of states.’ 

After 1902 the census of manufactures has been taken every 
5 years until 1919 and since 1919 ever}’^ 2 years until 1939. The 
census of agriculture has been taken every 5 years since 1910. 
The Statistical Atlas (containing graphic illustrations of much of 
the census data) was first issued in 1874 [based' on the Ninth 
Census (1870)] and has appeared irregularlj'- since that date. 

In 1929 a census of distribution as well as of manufactures 
was taken; but when the National Recovery Administration 
began operations, many of the data assembled in the census 
year 1930 Avere out of date owing to the sharp business recession 
and the increase of unemployment following that year. Along 
AA'ith the regular biennial census of manufactures for 1933 the 
Bureau of the Census undertook an extensiA^^e census of business 
of types other than manufacturing, such as amusements, serAuce 
businesses, barbenshops, beauty parlors, repair .shops, and tourist 
camps, coAmring more than 2,400,000 indirddual establi.shments. 

’ B}’’ order of the Secretary of Commerce, the collecting of financial sta- 
tistics of states AA'as discontinued temporarilj' after the 1931 report. With 
no comparatiAHc basis provided by the statistics for smaller cities and no 
indmdual reports for states, the remaining reports A\'ere of greatly reduced 
A'alue. A detailed analysis A\’as therefore made of the needs for data in this 
field and of the’Bureau’s past and pre.sent inquiries. Closely related reports 
Avere prepared for the director by the Central Statistical Board, the Advisory 
Committee to the Director of the Census, and the Municipal Finance 
Officers’ A’Sociation of the United States and Canada. Accordingly, the 
Division of Financial Statistics of States and Cities aa'bs reorganized in 1936. 
Annual Report of the United States Bureau of the Census, 1937, pp. 23-24, 
1938, pp. 28-29. 



74 


INTRODUCIION 


Tor subsequent biennial dates the census of business was 
further de\ eloped The census of business covenng the calendar 
jear 1935, for example, was much broader m scope than either 
the census of distribution of 1929 or the census of American 
business for 1933 The 1935 census of business attempted to 
obtain a reasonably complete picture of essential and com- 
parable items of business information concerning practically all 
lines of business activity in the United States It comprised 
a complete census of retail and wholesale trade, service businesses, 
amusement enterprises, hotels, broadcasting stations, advertising 
agencies, banking, insurance, real estate, bus tiaiisportation, 
trucking, warehousing, construction and distribution of manu- 
facturer’s sales through pumary channels 

Elaborate care was exercised in preparing the 17 schedules, 
before final use they w ere submitted for criticism to representa- 
tives of the business groups and governmental agencies prin- 
cipally concerned Special efforts were made by the Bureau to 
integrate the census of business and the biennial census of 
manufactures by the adoption of common definitions, instruc- 
tions area designations, and field procedures In order to 
perfect piocedure, conferences were held to discuss schedules, 
procedures and other problems inherent in such an expanded 
business census These conferences were attended by represents* 
tives of trade associations, professional groups, chain store 
organizations, etc , and by official representatives of a number 
of governmental agencies — the Central Statistical Board, 
Interstate Commerce Commission, Bureau of Toreign and 
Domestic Commerce, Tariff Commission, Federal Reserve 
Board, and Bureau of Labor Statistics '■ 

The population schedule for the census of 1940 is notable 
for a number of new questions concerning employment status, 
migration, income status, housing, and education It is also 
notable for the innovation of the sampling technique applied 
to one group of questions in order to widen the scope of the 
inquiries It diopped the question on literacy 
Employment and unemployment queries have been made m 
previous censuses, but the 1940 censils made a new approach 
Tlie new data permit classification of the nation’s labor force 
‘ \jinual Report of thr United States Bureau of the Census 1930 pp 



SOURCES OF STATISTICS 


75 


into the emploj^ed, tlie unemployed who have had pre\dous work 
experience, and the unemployed without pre^dous work experi- 
ence ^new workers. Thej” provide some measure of the volume 
of emplojTnent both during the whole year and during the week 
prior to the census day, Apr. 1, 1940. 

The schedule included questions that distinguish people at 
work, people unemjjloj'ed Avho are .seeking work, and people who 
have a job but are not at work because of temporaiy illness, 
industrial disputes, or vacations. Persons at work were asked to 
indicate the number of hours they worked during the week 
preceding the census, and the unemploj'ed were asked to state 
the number of weeks thc 5 ’’ had been seeking work. Workers 
were classified as to Avhether the 3 '' %vere in private industries or 
were emploj’ed by the government and whether thej’' were own- 
account workers or unpaid familj^ Avorkers. 

The new inquiry on wages and salaries is important as a 
measure of national purchasing power and its distribution, and 
the resulting data have been helpful to business in indicating 
potential market areas. 

The net effects of internal population migration during the 
])receding 5 j^ears Avere obtained bj^ reqAiesting the place of 
ro.sidence for each person as of Apr. 1, 1935. It is e.xpected that 
compilation of the statistics comparing such residence AA’ith that 
of Apr. 1, 1940, AA^hich is also recorded on the schedule, AA'ill 
measure the effects of industiy shifts, droughts, depre.ssions, 
floods, the backfloAV Avest to east, and the shift from the city to 
the countrj', or Auce versa. 

In 1940, for the first time, the decennial census included a 
separate housing schedule designed to giA'e detailed information 
for each dAA'elling unit in the United States, AA'hether occupied or 
vacant, lairal or urban. Data AA'cre obtained as to the number of 
rooms, AA'ater suppliq bath and toilet facilities, and light equip- 
ment. For each occupied unit or household, information was 
obtained concerning the principal means of refrigeration used, 
the presence or absence of a radio, the character of the heating 
equipment, and the principal heating and cooking fuels used. 
Each residential structure AA'as described in respect to single, 
double, or multiple family occupancy, AAdiether or not it contained 
a business unit, for Avhat purpose and in Avhat jmar it Avas orig- 
inalty built, the principal exterior material of the structure, and 



tMliOniCTIOS 


7(» 

whether it was in need of major repairs The «chcdule includeil 
1 question on whether the finuh lenses or owti*., whether there 
Is mortgage indebtcdne'^s, and methods of home finance 

It IS expected that the compilation of the'^e data \nn provide 
xaluable mfonnation on the latent purchasing power of a com- 
mumtj There is no more important index of the social and 
economic status of a population than the standard of its housing 
Hou'Jing experts beliei e that the information gathered ml} be of 
inestimable xalue in determimng future housing policies It 
w ill be of especial interest to manufacturers, builders, distributors, 
and bankers in their Rtud\ of trends m home ownerslup and 
building m the United States Cities will be able to determine 
the distribution of the \anous t 3 pes of housing within their 
limits together with (he possible need of evpansion of transporta- 
tion and commumoation sjstenis police and firc protection, 
schools, and ‘"iniilar facihhes Data showing the equipment in 
houses together with the state of repair of the homes, will be of 
xalue to manufacturers and distributors of housing products 
in the planning of their sales campaigns * 

The agricultural schedules for the census of 1940 likewise had 
a number of new features Nine regional schedules, each used 
in a separate group of states, were especiallj designed to fit 
national xariations in cropping practices. Questions designed 
to obtain subtotals for the \aluc of \anous major categories of 
farm products sold or traded m 1939 made possible a much 
closer estimate of total farm income and of farm income by 
principal sources The 1940 census al&o introduced ‘i.*'=xipplp 
mentarj plantation schedule for use in the cotton belt that made 
possible a refined distinction between farms and plots cuUn ated 
b> croppere and defined the e.\act status of each cropper and 
certain other tenants in relation to the plantation owner Ques 
tions to measure the effects of current agricultural policies were 
also asked, relating to soil improxement crops, summer fallow, 
crop failure, and succession or mterplanted double cropping 
The Bureau of Foreign and Domestic Commerce is the great 
Federal fact analj zer and fact puhli&Iier in the Department of 
Commerce It has a cunous and rather complicated history 
From the beginmng of the national penod, the statistics of 


‘C/ The New i orf^ Ttmes Jan 24 1&40 



SOURCES OF STATISTICS 


77 


foreign commerce were linked up Avith our tariff policy and main- 
tained by the Treasury Department. In 1856, growing out of 
an investigation of the tariff policies of other countries by the 
State Department, there was created a Bureau of Foreign Com- 
merce as a permanent bureau for the purpose of collecting 
statistics on foreign trade. In 1866, the Bureau of Statistics 
of the Treasury Department was created to take special charge 
of this work, and at the same time Congress gave it power to 
collect statistics on domestic trade as well as on foreign trade. 
In 1905, a Bureau of Manufactures in the Department of Com- 
merce was organized to foster, promote, and develop the various 
manufacturing industries of the United States, and markets for 
the same at home and abroad, by gathering and publishing all 
available and useful information concerning industries and 
markets. 

As a consequence, there were bureaus in three separate depart- 
ments (Treasury, State, and Commerce) concerned with the 
gathering of foreign-trade statistics. In 1912, however, these 
functions were centralized in the Bureau of Foreign and Domestic 
Commerce of the Department of Commerce. 

The most important statistical publications of this bureau 
are the monthly Survey of Current Business (uith a weekly sup- 
plement) and the annual Statistical Abstract of the United States. 
Special publications, designed to aid business are also prepared, 
for example, historical studies of industries, studies of the national 
income produced, and studies of market data.^ 

Other bureaus of the Department of Commerce are the 
Bureaus of Fisheries, of Patents, and of Navigation and Steam- 
boat Inspection, each of which publishes specialized statistics. 
The two great statistical organizations in the Department of 
Commerce, however, are the Bureau of the Census and the 
Bureau of Foreign and Domestic Commerce. 

Department of Labor. The United States Department of Labor 
also contains bureaus that publish statistics, the most important 


* Illustrations are P. W. Barker, Rubber Industry of the United Stales, 
1839-1939 (1939); Division of Economic Research, National Income in the 
United Stales, 1929-35 (1936); B. P. Haynes and G. R. Smith, Consumer 
Market Data Handbook (1939). For otlier statistical publications of the 
Bureau of Foreign and Domestic Commerce, see the United States Covet nmenl 
Manual. 



78 


lA 2 RODUCTJOA 


from the point of vien of quantity of data compiled and pub 
li^hed being the Bureau of Labor Statistics This was created 
m 1884 as the Bureau of Labor although the Treasury Bureau 
of Statistics created m 1866 had been enjoined to collect wage 
statistics In 1888 the Bureau of Labor was made an inde- 
pendent Department of Labor The duties of the Department 
of Labor were to acquire and diffuse among the people of the 
United States useful information on subjects connected with 
labor in the most general and comprehensive sense and espe 
ciallj on its relation to capital the houra of labor the earnings 
of laboring men and women an<l the means of promoting tlieii 
material social intellectual and moial prosperity The com 
missionci of labor in charge of the Department was specially 
charged to investigate the causes of and facts relating to all 
controversies and disputes between eraplojers and emplojees 
and he was also empowered to make special studies of articles 
controlled by trusts and their effect on production and prices 
and other special subjects Owing to the eNCelient work of the 
Department under the wise guidance of Carroll D Wright the 
first commissioner of labor there is available a large mass of 
statistics in the field of labor for this country including studies 
of strikes the effect of the introduction of machinery on employ 
ment and w ages the conditions of living and n ork of the labonng 
population etc Upon the basis of the wage and price data 
collected inde\ figures showing the tiends of wages and prices 
wholesale and retail hare been-constnicted and published by 
this bureau 

In 1903 the old Department of Labor was tiansferred to the 
newly created Department of Commerce and Labor but m 1913 
there was cioated a new Dcpaitment of Labor and m that 
department the Bureau of Labor Statistics At the present 
time the principal publications of the Bmeau of Labor Statistics 
arc the Monthly Labor Renew (published since 1915) bulletins 
on special topics such as wholesale puces retail prices cost of 
Jnmg wages and labor turnoaer and monthlj serials to supple- 
ment the bulletms and gi\e current mfoimaticm on those topics 
Beginning in August 1939, tlie Buieau of Labor Statistics pub 
hslied a dai]> inde\ of 28 basic coniniodit> puces at wholesale, 
but following the inauguiatfon of wartime price controls this 
index was published onh once a week since eontiol m the raw 



SOURCES OF STATISTICS 


79 


material field was widety effective. During wartime the index 
was of little importance.^ 

Treasury Department. For the period before the Civil War 
the chief source of financial and price statistics in the United 
States, as well as data on governmental finance, consists in the 
finance reports of the Secretary of the Treasury. 

Before the development of statistical bureaus in the Depart- 
ment of Commerce and the Department of Labor, the Treasury 
Department was the most important source of Federal statistics; 
and it is .still important in the fields of banking and monetarj’’ 
statistics, owing to the work of the comptroller of the currenc 3 '’, 
and in the field of income and Federal taxation and indebtedness, 
ovdng to the Avork of the commissioner of internal revenue and 
the Secretary of the Treasuiy. 

From the United States Treasuiy Department comes the 
monthly Statement of the Public Debt of the United States. 
The commissioner of internal revenue of the Treasuiy publishes 
an annual report of income-tax returns, constituting the most 
important source of data regarding income statistics in the 
United States. The annual reports of the comptroller of the 
currency give financial and banking statistics and monetaiy 
data going back as far as the CiAul War, when the national bank- 
ing system began. The comptroller publishes these data in 
an annual report and also seAmral times a year in the Abstract of 
Condition of the National Banks.^ The annual reports of the 
director of the mint contain statistics on the production of the 
precious metals, including gold and silver. The Life Saving 
Service of the United States Treasurj'' Department publishes 
data on marine accidents. 

Interior Department. The Department of the Interior has 
important statistical aspects, too. The Bureau of Mines pub- 
lishes data on fatalities in coal mines. The Geological Survej^ 
publishes data on metal statistics and minerals. In the census 
years it has authority to collect statistics from primary sources. 
Since 1880 it has collected statistics carefulty as to the crude 

1 For other statistics published by the Department of Labor see the V ruled 
Stales Government Manual and see also Bureau of Labor Statistics, Selected 
List of Publications of the Bureau of Labor Statistics (1939), which can be 
purchased from the Government Printing Office. 

- See page 82 on the Federal Resen’^e System. 



80 


ThThom cjio\ 


oil lifted from flic ground, iron ore, etc, “itching the phycrical 
consumption of our natural ue^ilth It also collects and pub- 
lishes statistics on electrical power production which are now 
considered useful m the study of the general trend of business, 
so important to business is the use of electricity Other bureaus 
m the Department of the Interior are the Bureau of Education, 
Bureau of Pensions, and the Bureau of Indian Affairs, each 
publishing certain specialized statistics indicated by their titles 
Department of Agneuliure The Department of Agriculture 
was not founded until 1862, but statistical A\ork. relating to agri- 
culture of a more or less systematic nature dates back to 1839, 
when Congress appropriated $1 000 out of the patent fund, to be 
e\pended under direction of the commissioner of patents, “m 
the collection of agricultural statistics, and for other agncultural 
purposes " At the present time the great bulk of Federal 
statistics on agncultural matters is collected and published by 
the Bureau of Agricultural Economics, which onginally was the 
Bureau of Statistics m the Department of AgncuUure and later 
was known as the Bureau of Markets and Crop Estimates In 
addition to a host of bulletins on special subjects related to 
agriculture, this bureau publishes a monthl} report on weather 
conditions Crops and Markets, and gives out estimates of annual 
crop yields In recent jears it has become the source of pioneer 
statistical work m the measurement of the factors influencing the 
demand for agricultural products and other similar statistical 
studies in connection mth the conductof the Agncultural Adjust- 
ment Administration The agriniUural yearbook, published 
this Department, is a valuable record of agncultmal progress in 
the Xlmted States and contains also e\tensi\6 summancs of 
agricultural statistics Since 1036 these summaries ha%e been 
published separately under the title Agricultural Statistics 
Current agncultural data are disseminated bj the Department 
of Agriculture in its monthly publication, the ApnculturaZ 
Situation The Bureau of Agncultural Economics, which has 
drrocf (rfrarge o/ the ainsvw aiVo Armtshes part of 

the program for the Farm and Home Hour on the radio, designed 
to distnbute timely agncultural information to the farming 
population of the nation 

The admmistratii e departments of the government thus con- 
stitute sources of statistics on a large scale, and statisticians 



SOURCm OF STATISTICS 


81 


contimiallj' make use of tliese Federal sources of statistics. 
These publications of the government are available to everyone at 
very low cost and can be found for free use in most large libraries 
of the country or at offices maintained for the purpose by the 
government. 

The Independent Estahlishments. In addition to the admini.s- 
trativc departments of the national government there are many 
national commissions or boards or agencies, collectively described 
as the “independent establishments” of the government. Some 
of these have become Avell-known sources of statistical data in 
special fields. The principal ones arc the Interstate Commerce 
Commission, the Federal Trade Commission, the Federal Security 
Agency, the Federal Power Commission, the Federal Deposit 
Insurance Corporation, the Securities and Exchange Commission, 
the Tariff Commission, the Maritime Commission, and the Board 
of Governors of the Federal Reserve S 3 Astem. 

The Interstate Commerce Commission was created in 1887 
as the Federal government’s solution of the railroad problem, 
following detailed Congressional reports of the situation, known 
as the Windom Report (1873-1874) and the Cullom Report 
(1886). These reports ma}'’ be said to be the beginning of Federal 
railroad transportation and communication statistics. Since 
1887, such statistics have been gathered and published by the 
Interstate Commerce Commis.sion, its powers having been gradu- 
ally extended to include other types of transportation, oil pipe 
lines, and express companies. In 1934 Congress created the 
Federal Communications Commission, which is dcAmted primarily 
to telephone, telegraph, cable, and radio. 

' The Federal Trade Commission is the Federal source of data on 
the monopoly problem. In 1890 the Sherman Antitrust Act 
Avas passed; and in 1903 Congre.ss realized that there Avas need to 
collect facts to be used as a basis for the enforcement of the 
Sherman Act. At the urgent request of President Roosevelt, 
Congress created the Bureau of Corporations for the purpose of 
gathering data that Avould aid in the proper enforcement. Fol- 
loAving the passage of the Federal Trade Commission Act of 1914, 
the Bureau of Corporations AA^as merged AA'ith the Commission. 
This Commission publishes reports on its investigations of Amrious 
trusts, such as the investigation of coal, cotton, cereals, meat 
packing, and a number of others. During the 1920’s and 1930’s 



})2 ISTHOUUCllQS 

)t Mas a collector and pubbsher df statistics concerning trade 
a5«ociations and trade practices 

Tlie Board of Goicmorsof thol edcral Reserve fej stem, abicli 
lias oper itcd since 1913 has become the greatest national source 
of statistics on banking and financial subjects It publishes an 
annual report containing statistics on banking and related sub- 
jects the Member Bank Call Report sc\era.\t\n\csvi. } car and the 
hederal liesenc Bulletin a monthly publication in\aluablcto 
bankers and btati«tieians working in banking subjects In addi 
tion It ptib{i'*hcs Mcekh mimeograph<^ press releases on the 
condition of i ederal ro-er\ e banks and of reporting member banks 
m order to make aaailable more current dati than is possible 
with the monthlj or annual publications In addition to financial 
and banking statistics the Board also has constructed through 
its Dnision of Research and Statistics an index of production 
( ilculatcd upon a comprchcnsuc ba«is, this index and other 
spew tf studies arc afso published m the annual reports and m the 
hilcral heserve Bullclin 

Tlie United States lanff Commission created m 1916, gathers 
statistics purporting to ud m the administration of tho tariff 
laws and to help determine when duties should be raised or 
lowered Owing to the strong influence of politics uj>on tho 
iiue^tion of tho tanIT tlie btudies of the Tanff Commission walh 
certain notable exceptions constitute a great source of misuse 
of statistics This was particularly true for the period from 
1920 to 1932 when most of its studies were for the purpose of pro\ 
ing the nceil to raise tanfls \fler the passage of tho Reciprocal 
Trade \greemcnts Act m 1934 e\tcn«ue improxcments weio 
inaugurated and additional data were made asaiilable with the 
numerous studies that were conducteil in cooperation with the 
State and other goxtrnmcntul departments 

Finalh in connection with Federal statistics it should be 
mentioned that froquentU Congressional inxestigations result m 
theassembli and publication ofsoluaWcfetatistical matcnalofteii 
coastiUiting original sources or at least original compilations of 
such matin i! Mention has alreidj been made of the \\ indom 
Report m 1873-1874 and the Cullom Report m 1880 both on 
transportation wliioli Itsl to the cication of the Intci-statc Com 
niirce Commis>-ion m 18S7 Other examples arc the Piijo 
Monc\ Tnist Rei>ort of 1913 and the a ftnous reports of the Senate 



SOURCES OF STATISTICS 


83 


aud House Committees on Banking and Currency during the 
1930’s on brokers’ loans, branch banks, tlic operation of tlic 
national and Federal resei-ve banking systems, foreign loans, and 
stock-exchange practices. Important Federal legislation of that 
decade Avas based on these investigations. 

Several noteworth 3 ’- special commissions, created by Congress 
from time to time, haAm produced published documents that have 
become famous as great sources of primary statistical information. 
The Aldrich Eeports from the Senate Committee on Finance, 
on Retail Prices and Wages (1892) and Wholesale Prices, Wages, 
and Transportation (1893) constitute extensive compilations of 
price data coA'^ering a period of over fifty years. These reports 
haAm been extensiA^ety used as source material for statistical 
studies of prices and AA'ages for the period 1850 to 1900. 

The Industrial Commission created Iaj"^ act of Congress of 
June IS, 1898, submitted a report to Congress in 1902, consisting 
of 19 Amlumes and presenting a substantial!}’- complete epitome of 
the industrial life of the nation and of the important changes in 
business methods that occurred in the latter part of the nineteenth 
centuiy. These A’olumes ai’e largel}'- statistical in their methods 
of description. The Immigration Commission, created in 1907, 
presented to Congress in 42 A’olumes a full inquiry into the sub- 
ject of immigration, reA’icAA’ing statisticallj’ immigration to the 
United States during the period 1820 to 1910 and the com- 
ponent elements in our population as determined bj’’ immigration 
from 1850 to 1900. The National Monetary Commission, 
created in 1908, studied the banking and currenc}^ systems of 
the United States as compared AA’ith those of other countries. 
This Commission collected more complete statistical information 
AAith regard to the banks of foreign countries such as Great 
Britain, France, and German}'- than had ever been collected 
before and for the first time in this country obtained compa- 
rable statistics for all bai-dvs in the United States. The full 
report of the Commission, consisting of 24 volumes, aa’us com- 
pleted in 1912 and served as the basis of the bank-reform 
legislation knoAA’n as the Federal ReserA’e Act. 

Other similar statistical studies in various fields of economic 
and social life have been made by commissions, such as those of 
the Select Committee on Wages and Prices establi.shcd in 1910, 
the Commission on Industrial Relations created by an act of 



81 


INJ RODUCl I0\ 


1912, 'ind the Comimssion on Nitiouil Grunts to Vocation il 
nducation The Hoo\cr Committcison Social Trends (1933) 
published cKtensue studies p'trth statistical in character, of the 
economic and social life of the nation 

One of the most notable of such temporarj organizations was 
the National Resources PKnmug Board established m the 
cxecutue office of the President of the United States under 
authont} of the Reorganization Act of 1939 This Board 
succeeded the National Resources Committee, nhich had been 
established in 1935 Earlier names of the same organization 
were National Resources Board and Adiisorj Committee and 
National Resources Board, nhich was created in 1934 to succeed 
the planning organization of the Federal Emergenc\ Administra- 
tion of Public Works When the United States Congress dis 
covered what it felt was an attempt b> the e\ccuti\e to usurp 
Congressional powers bj having an economic planning board, 
it became hostile to the National Resources Planning Board 
This hostility was not dimim&hed when in 1943 the Board pro 
«cnted to the executive a plan for the postwar expansion of the 
rocleral security program President Roosevelt handed the 
report over to Congres>s for action but the Board was abolished 
m that year when Congress refused to vote funds for its con 
tinned existence During the course of its checkered career 
however, the Board became the author of several noteworthy 
statistical publications Energy Resources and National Policy 
(1939) The Problems of a Chanytny Population (1938), Consumer 
Incomes tn the Untied States (1938), Consumer Expenditures 
tn the United Slates (1939), and The Structure of the American 
Iconomy (1939) 

State and Municipal Sources The activities of the various 
state gov emmenls result also m Uie compilation and publication 
of statistics Most states roamlam departments of institutions 
and agencies that, through supervision of reform schools, prisons, 
hospitals and the like, become sources of statistics on mental 
and physical pathology, as well as delinquencv Data concern- 
ing the records of penal and chantable in-stitutions, hospitals, 
and asylums for the insand and feeble-minded are primarily 
recorded by state or by municipal organizations 

^ ital statistics, that is, datv relating to births and deaths and 
the cl issification of death*, by causes, have become an important 



SOURCES OF STATISTICS 


S5 


part of tlie demograpliic work of munifipalilies and states and 
have thus made the state and municipal governments important 
primary sources of data of this character. In addition, statistics 
on marriage and divorce are recorded through state and munici- 
pal licensing administration. 

Data are recorded b 3 ’' states and regularly reported, based on 
their tax-collecting, licensing, and registration responsibilities. 
For example, statistical data I'esult from automobile registration 
by states. 

State incorporation laws result in the accumulation of data. 
State incorporated banks and ti-ust companies and building and 
loan associations, for example, are all regulated In" the banking 
departments of the various states, and statistics regarding these 
institutions are regularlj' compiled and published bj- thc.se 
departments. Similarl}", life insurance, fire insurance, automo- 
bile and casualty insurance, and workmen’s compensation laws 
and social-securit}" laws have resulted in state-regulating bodies 
and the compilation and publication of statistical data on 
financial, commercial, and industrial subjects. 

A number of the larger and older of the industrial states have 
highlj" efficient labor departments, which compile and publish 
statistics of industrial conditions. Of increasing importance and 
interest to social scientists is the development of the volume of 
statistics relating to industrial accidents and diseases, growing 
out of the need for such statistics in the administration of the 
workmen’s compensation laws. 

The regulation of public utilities and water companies and 
street-railwaj" and bus companies bj' state and municipal authori- 
ties has made the public-utilit}" commissions of the states the 
principal primaiy sources of statistical data on these important 
industries, although in the 1930’s many of these data were 
gathered bj" the Federal Power Commission and the Security 
and Exchange Commission. 

WORLD STATISTICS 

Under the League of Nations progress has been made in the 
collection and publication of world statistics. These are pub- 
lished in the Monthly Bulletin of Stotislics of the League of 
Nations and also m its International Statistical Yearbook and 
its annual World Economic Survey. Statistics on world com- 



8(j 


fATltObVCTIO\ 


mercial banking and finance were pubJtlied m special League 
pubbcation« Prc\noiis to the \iori. of the League of Nations 
m this respect the U orfd Almanac had for man% jears been 
highh ^ alurd as a rough and read> source of a a anet^ of orld 
statistics and still constitutes* a popular source 

The Slalesman s I carbooL published b\ ^^acmIHan Com 
pan\ Ltd , London is a statistical and historical annual of tlie 
states of the world gi\ing data on population, area finance, 
commerce, and banking as well as figures on the fleets of the 
world and the worlds shipping It has been issued annualh 
since I8G4 The United States go\eriiment has alwaj-s shown 
considerable interest m statistics of foreign countries and has 
published them along mth the domestic datn, but this practice 
has been far more s} ‘steraatic and thorough since the First IVorfd 
War For example, the Federal Feserjc Bulletin regularh 
publishes statistics of prices banking and currencj conditions 
in the principal nations of the world foreign price statistics are 
published b^ the Bureau of Labor Statistics m it s special bulletins , 
and statistics on trade between other countries, that is, the trade 
of the world outside the Unitwl States and not with the United 
States are published bj the Department of Commerce mk ol 2of 
the Commerce 1 earhook (as well as the statLstics of our own foreign 
trade) In 1938 the Pans International Chamber of Commerce 
published a brochure on the economic statistics in 20 countries 

In addition to <»uch collections of statistics for all oi a majoriU 
of the countnes of the world, mention should be made of the 
•sources m greater detail than the world aolumes, for statistics 
concerning three of the important countnes of Europe For 
England and the Dominions there is the Staftslical Abstract /or 
the Bnlish Empire publnhed bj the Board of Trade This 
combines what was previouslv published in the Statistical 
Abstract /or the C ntled Kingdom (first issued in 1864 for the j cars 
1840-1853) and the Statistical Abstract/or the Seieral British Oier 
sea Dominions and Prolecloralcs (first issued in 1864 for the j ear'* 
1850 18G3) The French goaemment pubh'>hes Anniiaire sfafis 
tique (1878) and the Bulletin rfe la statisUque gin^rale (1911) In 
German> the official source of statistics istheStatislischesJalirbuch 
fur das deulsche Reich (1880) 

It has long been recognized that international statistics would 
be extremeU important in obtammg true international, political 



SOURCES OF STATISTICS 


e> 


87 


and economic understanding and cooperation. Consequently, 
for many decades, elforts have been made to arrive at some sort 
of international understanding on methods to make the com- 
pilation of international statistics feasible or at least to improve 
existing world statistics. The statistics of each countrj^ are 
gathered according to the needs of that country; and since the 
problems in respective countries differ, so do the statistics. 
Their compilation and classification, according to varying 
definitions of units and varying bases of classification, produce 
startling differences in the final results. Then, too, the economic 
organizations of the various countries are different. A country 
vdth a large amount of transit trade and heavy reexportation 
of goods imported needs a different sort of classification of foreign 
trade statistics than a country doing little reexport business. 
Furthermore, the statistics themselves are gathered and organized 
in diverse ways in the various countries ; the methods of collecting 
the statistical raw matei'ials, the periods for which these data are 
gathered, and the methods of classification are not the same in 
the various countries. 

The endeavors made in the last eighty years for better inter- 
national statistical information, therefore, were first concentrated 
on the problem of rendering national statistics more comparable, 
since national statistics must be comparable between the various 
nations before thej^ can be added up or compared to obtain 
international or world statistics. Quetelet, the Belgian who did 
so much to organize comparable inteimational astronomical 
observations, was likewise the first to tiy to solve the problem 
of obtaining the fundamental basis for better world and inter- 
national statistics. It is principally due to him that the First 
International Statistical Congress was organized in 1853 in 
Brussels. The main purpose of this Congress, the members of 
which attended in their private and not in their official capacity 
(although some were officials), was to bring about some degree 
of comparability in national statistics between the various 
nations. 

Another attempt to obtain international cooperation in 
statistical work was made in 1887 when the International Statis- 
tical Institute was formed. This organization, still in existence, 
elects members who are active in statistical work as professors, 
government officials, or members of private statistical offices. 



88 


I\TRODVCHO\ 


The Institute cannot bind ite members or the national go\ em 
ments of its members but makes progress bj suggesting impro\ e- 
roents to different countries 

The first official or semiofficial attempts for better uorld 
statistics Mere made in 1875 through the establishment of the 
International Bureau of the Unner^al Postal Union and the 
Bureau of the International Telecommunication Union (ongi 
nall^ called the International Bureau of the Telegraph Union) 
Both regularlj gather statistics on postal and telegraphic develop- 
ments Similar efforts in another field nere made for the first 
time m 1882 by the International Congress for Hjgiene and 
Demographj In 1005 another significant official attempt uas 
made for greater comparabilitj m a\ orld statistics In that j ear 
It the suggestion of the United Slates goaernraent a meeting 
u as held in Rome to formulate some plan for obtaining uniform 
itj of agricultural statistics Thts meeting led to tho founding 
of the International Agncultura! Institute uhich still is actne 
in the gathering of world statistics on ngnculture production 
consumption pnccs and trade The statistical information 
assembled b> this body is published monthlv and j early and 
special publications are also issued Si\lj two different coun 
tries are members of the Institute The Institute Mas aerj 
successful m putting national agricultural statistics on an 
mternationani more comparable basts and m assembling regu 
larlj good and reliable world statistics on all fields of agriculture 

Since the First World ttar the League of Nations has been 
the natural orgamzation to proceed with the work of interna 
tionalizing statistics Shortlj after its establishment the League 
started that work At the International Economic Conference 
of 1927 the problem of comparable national statistics in order to 
■secure good world statistics was studied The League of Nations 
sub^equentlj brought about an official meeting on the subject 
of international statistics and called an International Statistical 
Conference to meet m Geneva in Noi ember 1928 Thekevnotc 
of the Conference was that the general adoption of comparable 
international statistics nas de^rablc for good international 
policies and in the interests of permanent m orld peace The aim 
of l! c Conference \\ as to bnng about the bro ulemng of the scope 
of national statistics in all countnes where it seemed to be needed 
and to attempt to make national st tlistics m different countries 



SOURCES OF STATISTICS 


89 


comparable. The Conference einpha.sized once more that such 
attempts meet with manj- difficulties. Of the 42 countrie.s repre- 
sented (some nonmembers of the League, like the United States, 
Avere also represented), onlj'- 29 countries felt thej" could sign the 
Convention and Protocol of the Conference. To induce that 
number to sign, it Avas necessary to limit greatty the program of 
AA'ork. 

NcAmi-theless, the Conference of 1928 did produce good re.sults. 
A number of points Avere discussed, and important conclusion.® 
AA’-ere reached. In addition, the Conference created a committee 
of technical experts to meet from time to time and make sugges- 
tions for further progi-ess. This group met in March, 1931, and 
formulated a constitution for future AA'ork. It met again in 
December, 1933, to discuss problems of statistics on foreign 
trade. Up to the present time, its contribution to the solution 
of the problems inAmh^ed has been inconsiderable, but it maj"^ 
make adA'ances in this important AA'ork if the countries concerned 
AAill be AAilling to carry out the recommendations made by it, as 
the}’- are apparently committed to do by the ConA’-ention and 
Protocol of the Conference of 1928. 

In 1936 the tAA'enty-third se.ssion of the International Institute 
of Statistics AA-as held at Athens. At that session there AA'ere 
75 members, of AA'hich 10 AA’ere from North America. TAA'ent}'- 
seA’en countries designated official delegates. Also, the Secretary 
of the League of Nations, the International Labor Office, the 
International Institute of Intellectual Cooperation, the Inter- 
national Institute of Agriculture, and the International Chamber 
of Commerce AA'ere represented.* 

In May, 1940, one of the 11 sections of the Eighth American 
Scientific Congi’ess conA’ened by the goA’ernment of the United 
States in connection AAdth the obsei'A’ance of the fiftieth anniA’er- 
saiy of the founding of the Pan American Union AA’as deA’oted to 
statistics. The program of the section had the folloAA’ing broad 
objectiA’es: (1) improvements in the comparabiht}’ of official 

*Stuabt, Phof. C. a. Vekijx, “La XXIIIeme session de I’institiit 
international de statistique, Athenes, 1936,” Revue de VitisHlul international 
de statistique, vol. 4 (1936), pp. 367-T03. The citation includes the summary 
of resolutions of the session (pp. 378-395) and communications from A’arious 
delegations on methods, legislation, organization, and administration of 
statistics (pp. 396—403). 



OO 


I\THODIjCTIOV 


jsUitixlics nnionK the Amencm nations, (2) improi ement‘i m 
►f'lfi'Jtinl methfxJolopi , (3) Iho furtherance of ncfiuamtinre 
amonp the 'ititisticnns of the Vnicricm couliiunt, (4) con 
fideration In thc«e vtatMicians of the jwn'JiWc ilc% elopment of n 
continuing profc<!sioinl mcihum for the interchange of stntLsticnl 
idcxs and information Correspondents jn ‘■c\er‘il of the 
Amcnem nations had pointoil to the need for closer profes 
sional collaboration among the atatuticinns of this hemisphere, 
and it was proposal to explore it this meeting the possibilities 
of Ostabli-hing some kind of an inter American statistical organi- 
ration of professional character Ihc result was the formation 
of the Inter \niencan Statistical Institute 

\, new quartcrh the i?stadistiea published in Mexico, is the 
official organ of the fntcr Amcncan Statistical Institute, con- 
stituting one of Its mediums for fostenng statistical dc\ elopment 
m lilt Mestom Hemisphere It cndcaiors to acquaint the 
persons in one coiintrj mth statistical dcielopments m other 
countries, to inform its readers concerning the aanilnbilitj of 
(Lila to pnssent articles that wall tend to encourage the adoption 
of improNod methods, and hence to improae the qiidhta of data 
Articles maj appear m anj of the follow ing four languages 
Spanish rnglish Portuguese or French An author’s sum 
marj accompanies each article, the mmmar> is reproduced in 
hoaeral languages The Intcr-Amencan Statistical Institute 
a! 0 publishes a acarbook of statistics including statisticnl data 
fo” Vmencan countnes and ^«o^th America 

Prospects to secure comparable world statistics and for inter 
nation il statistio fluctuate with the nse and fall of isoUtionLsm 
and nation ilism Under the league of Nations and undei the 
P \n Vrotnc in Union progress Jias Ijccn encouraged, onl> to be 
!>ami>ered bj ci cr persstent isolationi-'m m one countrt or 
another \c^crlllelc«s, the need for comparibte data watli 
respect to all nations of the world has become more and more 
evident it has come to be more and more appreciated as the 
/iroldejns Jujjn iirnn y» «w- 

fcrcnrcs and committees, and more and more is it coming to 
!>o realized that such statistics arc a pressing ncccs.sitj to busi 
nessmen with interests apread far and wide over the inter 
national field 



SOURCES OF STATISTICS 


91 


Widle it has been stressed in this section that there are as yet 
no truly comparable international statistics, the student of 
international affairs and the intex-national businessman will be 
able to obtain what constitutes for the present the closest 
approximation to them from a number of sources, chief among 
them the folloTOng; (1) Iniemational Statistical Yearbook (pub- 
lished by the League of Nations); (2) Vol. 2 of the Commerce 
Yearbook (published by the United States Department of 
Commerce); (3) The International Appendix to the Statistics 
Yearbook of Germanj’- (Statisiisches Jahrbuch fur das deutsche 
Reich); (4) the Statesman’s Yearbook. The World Peace Founda- 
tion publishes also a subject index to the economic and financial 
documents of the League of Nations. 



CHAPTER IV 

PRESENTATION OF STATISTICS 
TABIXS 

Principles of Tahulaiion Tabulation is the mechanical part 
of clabsification Its function is so to arrange the physical pres- 
entation of quantitative facts that there can be no misinter- 
pretation of their significance The attainment of this object 
depends upon the folloiVing principles 

1 Concise, clear, and complete titles attached to the table 
Usually the title is placed at the top above the table but it is 
sometimes placed at the bottom The function of the title is to 
give a general description of the contents of the table 

2 Careful unambiguous description of the units of measure- 
ment or presentation used in the collection and recording of the 
data This is ordinarily placed immediately under the title 
Subheadings frequently require definition of units 

3 The arrangement of the data in columns and rou s accord- 
ing to a clear!} indicated basis for classification 

4 The exact description of columns and roiis by the use of 
caption headings and stub headings 

5 Footnotes to clanfy headings or subtitles or*to specifj 
limitations of particular figures 

The scheme sliorni on page 93 gives an abstraction of the 
mechanics of tabulation It shous the position of the title and 
the description of units above the table and for illustration 
designates four columns, numbered (1), (2), (3), and (4), and 
three lOMS, lettered (i), {y), and (z) 

The four columns aresubcolumns — (1) and (2) aresubeolumns 
of cofumn (a) and (3) and (4) are subcolumns of column (h) 
The caption lieadmgs would appeal in the spaces designated 
(a) and (h), leapectuely, and subcaption headings would appear 
in the spaces designated (1), (2), (3), and (4) Similarly, the 
thiec roMS aic dcsciibcd In stub headings appealing m (x), (y), 
viul {z) The sp-icc, (/>) is ior the guiotal description of the stub 



PREHEMTATIOX OF STATISTICS 93 

headings. It is possible also to have stub subheadings. In 
order to illustrate further, there is reproduced in Table 1 on 
page 94 data compiled from the replies to the questionnaire 
shown on pages 46-47. 


Tm.c 

(Dcf-criplion of units) 

(O) 

(n) j (h) 

(1) (2) J (3) j (n 

(x) 

i 1 

(!/) 

1 

! 

(3) 

} 


General-purpose and Special-purpose Tables. A mere glance 
at the specimen taken from the publication of the United States 
Department of Agriculture is sufficient to lead to the conviction 
that such tables are not meant for light reading. The}-- are 
essentiall}’’ reference tables, or general-purpose tables. ’ The piin- 
cipal guide in the 2 onstmction of general-purpose tables is to 
include as much as possible in as small a space as possible, con- 
sistent nnth presentation of the amount of information deemed 
necessary. Thus the tables contained in such publications as 
the United States Census reports or the Federal Reserve Bulletin 
or the Survey of Current Business maj’’ not constitute popular 
reading; but they are a great boon to all who seek ready access 
to details, arranged in a manner so faciUtating their discovery 
by the careful observer that looking up a particular figure is 
almost as easy as looking up a word in the dictionaiy. 

When a table is to be read — i s to te ll a story — ^it i£ called a 
special-purpose table. Such a table should have as its out- 
standiirg characteristic the qualitj"^ of simplicity. It should not 
try to tell too much at once; if necessary, more than one table 
ma}'- be used for telling a more complex story. Special-purpose 
tables should have a great deal of white space in and around 
them to make lazy readeis (and most people are laz}'' when it 
comes to j'eading tables of -figures) think them cas^'- to read. 
The type or print should be sufficiently large for easy reading. 
The reader should be adequately prepared or oriented to the 









(Wliite nonrclief famil ca that include a I isband an i uifo 


0 } 



2 703! 2 553 2 403' 


PRESENTATION OF ST A T 1ST ICS 


95 


table by the text accompanying it and particularly by the title 
of the table. Briefly, the story of the table should be told in 
literary form in the text, reliance ^being placed on the table 


Table 2. — Average Disbursement-s of Consumer Unit.si in Each Third 
OF Nation, 1935-1936 



^Average disbursements 
of families and single 
indit'iduals in 

Peicentage of income 

Category of disbursement • 

Lower 

third, 

incomes 

under 

$780 

Middle 

third, 

incomes 

of 

$780- 

$1,450 

Upper 

third, 

incomes 

of 

$1,450 

and 

over 

Lower 

tliird 

iliddle 

third 

Upper 

third 

Current consumption : 

Food 

S23C 

1 ! 

.$ 642 

50.2 

37.5 

21.7 

Housing 

115 

^■T| 


24.4 

18.5 

13.8 

Household operation 

54 



11.4 


8.1 

Clothing 

47 


251 

10.0 

9.5 

8.5 

Automobile 

16 

57 

215 

3.3 

5.3 

7.2 

Medical care 

20 

41 


4.3 

3.9 

3.6 

Recreation 

9 

28 

89 

1.8 

2.6 


Furnishings 

9 

28 

'72 

1.8 

2.6 

2.4 

Personal care 

12 

22 

44 

‘2.5 

2.1 

1.5 

Tobacco 

10 

23 


2.2 

2.1 

1.4 

Transportation other than 
auto 

11 

19 

/ 

37 

... 

1.7 

1.3 

Reading 

6 

12 

23 

1.3 

1.2 


Education 

2 

7 





Other items 

3 

6 

15 




All consumption items 

.S550 


82,212 

116.7 

98.1 

74.8 

Gifts and personal taxes^ 

S 13. 

$ 39'$ 181 

2.8 

3.7 

6.1 

Savings 

-92 

-19 

566 

— 19.5 

-1.8 

19.1 

All items 

S471 

.$1,076|.$2,959 





■ Includes all families and single individuals, but excludes residents in institutional groups, 
* Taxes shown here include only peisonal income taxes, poll taxes, and certain personal 
j)roperty taxes. 

Source: National Kcsouices Committee, Consumer ETpej}diture8 in the United States, 
Estimates for ]935-36 (1939), p. 40. 

merely as a dramatic summary. Simple devices to aid inter- 
pretation and facilitate the mental vision of the table have a 
useful place in special-purpose tables, such as accompanying 
relatiA'e figures, methods of emphasis such as italics, or the 
scheme of ruling the table. 












1\TI10DIjC7 IO\ 


m 

The object of a special-purpose table mas* also be to compress 
into a small space a boclj of information “the narration of which 
in the tc\t would be cumbersome and exhausting to the reader 


3 — Shikl of Euk Third of XationS Consumlr Units* iv 
A< iflRi.o\n Di'NBURsEwtvrs, 1935-1D36 


C..«or> ofd.abur«-..u,.l 

d abura 
nuilions 


Percentage of agsregato 
diabursement for each 
category made bv 

Ldstw Middle 

Ihird itued 

•inHcr M S780- 
S7SO Si 4uO 

Lpper 

Ch»d 

M SI 4aO 
sod otcr 

third 

Mid lie 
third 

VE' 

Current consumption 

- 





loud 

S3, loss 5,3I0S 8,447 

18 4 

31 5 

50 1 

Hou‘*mf5 

1,515 2,621 

5,370 

15 9 

27 6 

56 5 

Household operation 

703 1,422 

3.160 

13 3 

26 S 

59 8 

Ciothini; 

618, 1,338 

3.305 

11 7 

25 5 

62 8 

Automobile 

203. 753 

2,823 

5 4 

20 0 

74 6 

Medical care 

234 546 

1 395 

12 0 

24 7 

63 3 

Recreation 

115 362 

1,166 

7 0 

22 0 

71 0 

Fumishinga 

112 368 

042 

7 9 

23 9 

66 2 

Personal care 

155 292 

585 

15 1 

28 2 

56 7 

Tobacco ' 

134 301 

531 

13 S 

31 2 

55 0 

Transportation other 






than auto 

150 247 

487 

17 0 

27 9 

55 1 

Reading 

84 165 

302 

IS 3 

29 9 

54 8 

Education 

30j 87 

389 

5 9 

17 2 

76 9 

Other Items 

35. 76 

196 

11 4 

2) C 

64 0 

\U consumption items 

87 226,813 890829 098 

14 4 

27 7 

57 9 

Gifts and person il taxes* 

8 171$ 5168 2,380 

5 6 

16 8 

77 6 

Sa\ mgs 

-1 2071 -252 

7,437 

-20 2 

-1 2 

124 4 

AJf items 

86,I90jSI4,I54|$38,9I5 

10 4 

23 9 

65 7 


■ tncludn all (ainilirt and irngle uidirKluala butezdadoirraiiienCa la inatitutioaal grouDi 
’Taiea ahp»n here include onl> personal meome larea poll tarea and certain personal 



Ltlimalfjor lt3o-3fi 11933) p SI 

It Is, in short, a method of condensation, and it is of the utmost 
importance that, as it tells so much in so small a corapas*’, it 
tell It .Is clearlv as piacticable 

‘ 1 \LK\tR, Roland P , 'StAtistioal Tabulatton and Practiee,’’ Journal 
ojlhe tinf/-if<in,S/n;is<ica;dMO«<in<»ii,Nol 11 (I916)cpp^ 192-200 




PRESENTATION^ OF STATISTICS 


97 


Tables 2'to G ave examples of special-purpose tables. They 
tell stories that are more or less hidden in the detailed but well- 


Table 4. — Percentage Distribtjtiox.s of iVoNRELiEP Families' in Six 
Ttpes of Community, by Income Level, 1935-1936 


Income level 

All 

families 

Families living in 

Urban communities 

Rural communities 

Metrop- 
oUbcs,2 
i, 500,003 
popula- 
tion 
and 
over 

Uaigc 
cities, 
100.000- 
1,500,000 
1 popula- 
tion 

1 

Middle- 

fiized 

cities, 

25,000- 

100,000 

popula- 

tion 

Small 

cities, 

2,500- 

25,000 

popula- 

tion 

Non- 

far/n* 

Farm 

Under S250 . . . . 

2.8 

1.7 



3.1 

3.0 

3.8 

$ 250-8500 

7.8 

2.8 


5.5 

6.3 

8.9 

13.9 

S500 -. S750 

11.3 

5,2 

7.6 

9,4 

10.3 

11.8 

18.0 

S750 - S1 , 000 . . . 

13.4 

8.5 

10.5 

13.6 

13.9 

14.4 

16.6 

SI , 000 - 81 , 250 . . 

13.2 

10.9 

12.4 

13.9 

14.6 

14.0 

12.8 

81 , 250 - 81 , 500 .. 

10.8 

. 11.0 


11.6 

11.1 

11.6 

9.8 

81 , 500 - 81 , 750 . . 

9.1 

10 8 


9.7 

9.4 

9.1 

7.0 

81 , 750 - 82 , 000 . . 

7.3 

9.7 

9.0 

8.5 

7.8 

6.5 

4.8 

82 , 000 - 82 , 250 . . 

5.5 

7.9 

i 6.9 

6.1 

5.8 

5 , 1 

3.1 

82 , 250 -. S2 , 500 . . 

4.0 

5.8 


4.5 

4.0 

3.4 

2.5 

82 , 500 - 83 , 000 . . 

5.2 

' 8.5 


5.4 

5.3 

4.4 

2.9 

83 , 000 - 83 , 500 . . 

3.0 

4.7 


3.1 

3.1 

2.3 

1.6 

83 , 500 - 84 , 000 . . 

1.8 

2.9 


1.7 

1.7 

1.3 

1.0 

84 , 000 - 84 , 500 . . 

1.0 

1.7 

1.6 

1.0 

0.8 

0.8 

0.5 

84 , 500 - 85 , 000 . . 

0.6 

0.9 

0.9 

0.7 

0.5 

0.6 

0.3 

$ 5 , 000 - 87 , 500 . . 

1.3 

2.1 

1.8 

1.3 

1.1 

1.4 

0.6 

87 , 500 - 810,000 

0.8 

1.6 


0.6 

0.6 

0.6 

0.4 

810,000 and 








over 

1.1 

3.3 


1.0 

0.6 

0.8 

0.4 

All levels .... 

100.0 1 

j 

100.0 

100.0 

100.0 

100.0 1 

I 

100.0 

100.0 


1 Excludes all families receiving any direct or work relief (however little) at any time 
during year. 

2 Metropolises of this size are in North Central Region only (New York, Chicago. Phila- 
delphia, and Detroit). 

* Includes families living in communities with population under 2,500, and families living 
in the open country but not on farms. 

Source: National Resources Committee, Consumer Incomes tn the United States, Their 
mstrihution in 1935-36 (1938), pp. 24-25. 

organized statistics collected bj’’ means of the questionnaire 
referred to above. In order to .simplify the data for presentation, 








UlLIES FOR CONbUMPTIOE, GIFTS \M> 

Income Level, 1935-1936 


IKTRODVCIlOK 






Table C. — Avkragk JixPKNDiTPRKs or AsrErucAN* Pashlibs fob Maix Catboories or Consumption, bv Incomn Lkvkl, 

1935-1936 


PRESUNTATION OF STATISTICS 


99 




li 

N 

KC 


to 


o 






.. 

CO 

04 




■ 

O.:: 

€/? 










— H 

cd 

04 

JO 

CO 

CO 


■ 













~co 

“ 

~ 

“c^ 

»o 








■ 







00 

04 

CO 

<n 

rH 


1 


E 




1 








04 

to 

ic 

CO 



«. 

■CQ 




Ei 








~ 

"oT 





r t£ 





B3 







-*< 

to 


04 

tH 



S.s 





1 











eo 



is fee 4“ 

to 

o 

Ci 


Tj* 

CO 

CO 


B 

B 

J.O 

CO 


o 

O 








1— < 


rH 

r-^ 

(N 

C'l 

CO 

CO 


—1 

Oi 




i- O'^ ® 

a o ■" « S 














CO 


<o 



ii 

a 


o 



a 

CO 

CO 


oo 

ro 

04 

o 






•= s 


1—1 



oi 

Ol 

CO 

CO 


-7< 

iC 

CO 

b* 

O 

04 

04 



o 

m 

































€0 




o 


00 

rfS 

h-. 

04 

to 

M 



CO 

m 







t G t 


fH 

*— « 



C*i 

CO 


-t* 

»o 

CO 

CO 

F-^ 

JO 

jO 

04 




</> 












T-H 


04 

CO 

d 


, , 


O 


oo 

CO 

CO 

CO 

o 


C4 

o 


t'. 

b- 


b- 



=•? s 


1-^ 

<N 

CO 


>'0 

CO 


CO 

O 


lO 

04 

t'F 

CO 


*2 















04 

04 


€0 

CJ 



















"i- 


i s 

o 

F— ( 


to 



Ci 

04 


jO 

CO 


o 

CO 



c. 


3.2 


F*^ 


<N 

CO 


-r 

CO 

CO 

o 

CO 

o 


00 

04 

—fc 



C! M 










F^ 


04 

cd 


O 


s 


K " 









" 




0C‘ 



■A c 


o 

00 


w 

f-4 

o 


o> 

C4 

CT) 

cc 

o* 

o 

b- 

r*! 

s 




<N 

CO 

T}S 

to 



o 

o 

CO 

to 

CM 

04 


CO 

CO 



>2 0 g 









F— ( 

FiH 

F-4 

04 

*? 

CO 


vs 


r«; " 















CO 

es 



iC 

00 


o 

CO 

CO 


C3 

C4 

o 

04 

04 


C4 

C5 1 



c . 

1— ( 



t>. 

a 

04 

to 

O 

Tf 

C/J 

CO 

04 

(T) 


to 

fH 



Z cS 






«-4 


04 

04 

CM 

CO 

to 

cO 

Ci 


*H 

--j 


-5--“ 

(fi 














— ♦ 

CO 




lO 

o 

00 

o 

CO 

h- 

— }< 


lO 

CO 

CO 


CO 

JO 




J: 

CO 

lO 


o 

OJ 


CO 

o 

>C 

F-* 

o 

to 

04 

CO 









1-H 

rH 


j-H 

04 

04 

CO 


to 

00 

04 

T*^ 

r-* 





















cj"" 














1-H 

04 



















CO 





IC 

0 

o 

ry> 




O 

C3 

o 

-f 

fH 

a 


04 



rf ^ 

lO 

CO 


CO 

rJ< 




CO 

1— < 

o 

r/) 

CO 


b* 

CO 



S2S.-I 

tC-o 




1— ( 

fH 

»-H 


(N 

CM 

CO 


lO 


f- 

tH 

rH 

Ol”^ 

fH 

CO 




o 




O 


SI 



»o 




O 

r-> 

CO 




o 



ra 

ro 

CO 

o 

—7* 

o 

CO 


a> 

o 

O 

04 





f-M 

(N 

04 

C4 

CO 

CO 


-f 

lO 


04 


br 

CM 
















t— [ 


04 




















CO 




CO 

o 

o 




CO 

51 

ezs 

o 

04 

00 


JO 

F— < 




T- i 

o 


ns 

5^ 

CO 

K] 


b9 

Gi 


»o 

CO 


CO 

CO 

CO 



c 

(N 

CO 

CO 



to 

JO 

o 

O 


CO 

o 

04 

l> 

04 




c 




















</> 











fH 

f-4 

F^ 

04 

</> 






ipi*4 


o 

04 



[Ml 


CO 



T* 

04 

o 







Cl 


1— • 

<y> 


liPl 



JO 

o 

CO 

04 

CO 





w 

a 


CO 

iC 

o 

Ci 

[•iM 

SI 

04 


o 

rH 

t/j 

CO 






















•< t 




F-l 





04 

04 

CO 


CO 

C5 

rf 

1—* 




c/:* 














fH 

















d 

d 

















d 

o 

o 









d 

d 

d 

d 

d 

d 

o 

d 

o 

o 

o 





a 




lO 

o 

JO 

C5 

c? 

C.J 

CO 

o 

o 

d 






g 

d 



C<1 

»r5 


o 

to 

o 


C3 

o 

04 

’ 





o 

o 

o 

1— < 



04 

d 

co’ 


lO 



c/? 

2 




o 







€0 

e/^ 

CA* 

€/.» 

VA 

CO 


t 

•• 

> 



o 

t- tfp 


i 

o 

A 

O 

o 

1 

c 

1 

CO 

c 

c 

o 

o 

o 

O 

o 





o 


to 

o 

to 

o 

1^5 

o 

C5 

o 

o 

o 

o 

o 

o 





c 

o 

o 

C^l 

to 

t>^ 

fM 

o 

of 

I’J 

SN 

o 

CO 

CO 

F-»r 

R. 

lO* 

o' 

d 

c 

04 






0® 

¥> 


CO 

5^ 

v> 

t/j 


f/: 

v> 



v:* 

</> 



1/2 


‘ourre: National Uo'^onrco-? Comniitteo, Contumcr fCzpvnditures in the United Staten, Esliyrnten for 1035-38 (1030), p. 23. 








100 


IWHOUljCllOS 


mcomc lc%el« are di\idcd into three groups, lower third, middle 
third, and upper third 'Hicsc tables illustrate also the use of 
percentage Bgurcs to facilitate their interpretation 

CHARTS 

Quick Msuahzation of man> rather complex situations can 
be rcadilj achieved b> mere!} looking at a simple chart It is 
said that nowadays the first step toward using a senes of data 
for anj sort of anah «is is to represent the figures bj alinedrawn 
on i chart So u'^efid is the chart in giving a quick grasp of (he 



Slaltment of tht Vnited Slattt Trca*<‘fy ^prtrlmenl ) 

characteristics of data that it has been adopted in manj popular 
books, m magazines, and m the financial section of metropolitan 
newspapers Figures 11 and 12 illustrate draraaticallj the 
manner m which charts arc used to aid in visualizing important 
developments dunng vvartimc In peacetime the trends of 
data, even though less sensational, arc watched with care, and 
charts greatlj Isciht'ite thetr aaaiysis 
The mv ention in 1780 of charting is claimed bj \\ illiam Plav 
fair, who set forth its advantages as follows ^ “An the eje is the 
' The Co7nmraal and political AUaa {Mtd , London 1801) p x Plaj 
fair 8 claim to be ‘ actually the first who applied the principles of Rcometry 
to luattera of Finance is made on pag(» *«i and iz Cited from W C 
Mitchell Dusinfti Cycles — The Problem emd Its Setltn{j p 209 In An 
Enquiry xnio the Decline and F all of Vafiotu P1 it fair w said to Iia\ t lx i ti tli< 
firrt to emplot grap) teal devices lo the tre itment of soeiologienl di euHsm i 




PRESENTATION OF STATISTICS 


101 


best judge of proportion, being able to estimate it Avith more 
quickness and accuracy than any other of our organs^ it follows, 
that wherever relatiA’e quantities are in question, a gradual 
increase or decrease of any . , . value is to be stated, this mode 
of representing it is peculiarly applicable; it giA^es a simple, 
accurate, and permanent idea, by giAung form and shape to a 
number of separate ideas, AA’hich are otherAA'ise abstract and 
unconnected.” 

INDEX, NOVEMBER 1941-100 



Fig. 12. — Productiou of munitions, including ships, planes, tanks, guns, ammu- 
nition, and all field equipment. (Data from War Production Board.) 

While the idea underhung the use of charts is quite old, the 
general use of charts for iride public consumption is of much 
more recent origin and probably OAves its present-daA’’ popularity 
to inA^entions haAung to do Arith the plating of charts for printing. 
From being largely a hand-labor process, the making of plates 
for the reproduction of charts has come in recent years to be a 
photoelectric process, Arith the result that today the most 
expenswe part of the charts in a book, neAAAspaper, or magazine 
article consists in the mental and hand labor inA’-oh'^ed in the 
original construction of the chart. 

There are five kinds of charts: (1) pictograms, (2) cartograms, 
(3) frequenc}'^ curves, (4) bWariate charts, and (5) cuiwes pictur- 
ing time series. 


“ William Playfair was, otxe may say, the Sir William Petty of the Edinburgh 
group , , ^ Lancelot T, Hogben, Dangcrou.i Thoughts (1939j, p. 283. 



102 


lNTItOnVC‘lI0^ 


Pictograms Tljere are four kinds of pictograms (1) linear pic 
tograms m •which the comparison is a linear one (2) areal 
pictograms m which the comparison is one of areas (3) cybic 
pictograms in which the companson is one of cubes or three- 
dimensional objects and (4) sectors and circles in which a circle 
IS used to represent a whole and its iinous sectors are parts 
of the whole 



1 o 13 — D sir but n ol tbo n Ik dollar {Data / on Tkv MUk Dollar Milk 
In I » TV Foundat on ) 

The purpose of pictograms is to aid in rapid visualizing of 
coordinate comparisons of magnitudes For example a picto 
gram might represent bj a picture of a man tlie popuhtion of the 
United States accompanitd bj a picture of proportionately 
smaller men representing lespectivcJj the populations of 
France and Germanj Sometimes pictograms are used to aid 
m Visualizing the proportional parts of a whole magnitude or 
companson of component parts as where a dollar is shown divided 
into sectors repiesenting the way m which the public dollar 
is spent Figure 13 is an illustration of a pictogram sboAving the 


PRIitiEKTA TION OF HTATISTICti 


103 


. “milk dollar.” The .‘^mall olxscured piece at the top represents 
2.98 cents of profit for the Neiv York City distributors. 

Areal and cubic comparisons arc not frequently used because, 
instead of simplifying the comparison desired, they are likely 



I'ki. 14. — Area compari'-ou. 


to confuse it. This is because the mind finds difficulty in 
quickly differentiating sizes of areas or of cubes. Figure 14 
shows two areas in the form of squares. One of these areas is 
actually one-half as large as the other; but, at first glance, 
it seems to be more than half as large. Consequently, if com- 



Kta. 15. — Cubir. comparison. 


parison of two quantities is de.sircd by charting, areal presenta- 
tion is not a desirable method of obtaining easy comprehension 
of the differences that it is dc.sired to stress. 

The difficulty is increased if the attempt is made to chart 
differences of magnitude by the use of cubes, for it is still more 


10 } 


I\TRODVCTIO\ 


difficult for the eje and mmd to grw>p geometric coniparatue 
magnitudes m three dimensions This is shown m Fig 1& 
which depicts two cubes one of which is one-half as large as the 
other though a first glance makes it appear to be tw o thirds 
as large For this reason the use of pictures for making com 
parisons is not considered to be the best practice For evample 
the presentation for quick visualization of different-sizcd men 
in uniform to represent the relative fighting strength of various 
countries or of different sized battleships to represent the relative 
size of navies will confuse the interpretation that the eje and 
mind will give to the relative sizes compared even though the 
relativ e s ze is given purely a linear setting in the actual drawnng 
of the figures ' Onlj the height of the uniformed men ma> be 
varied but this might lead to comicallj proportioned men and 
an ill ision of armies of tall thm men vs armies of short fat men 
If the uniformed men are properly proportioned for their varying 
heights this results m an areal comparison 

Consequentlj the most generally used types of pictogram 
are those involving merely linear comparisons and the use of 
purcl> abstract linear distances Rows of soldiers each soldier 
representing a specified number of men may be used to advan 
tage however the longer row representing the larger arm> 
Similarlj large and small navies can proper]} be compared by 
ro \8 of ships each ship representing a specified tonnage of that 
type of warship Such pictograras are rcallj linear comparisons 
as also are bar charts and sectors of circles 

Bar Charts and Sectors of Circles The use of bai charts 
and sectors of circles is widely practiced and finds its application 
whenever it is desired to compare two or more differing mag 
nitudes with each other or to give quick visualization of com 
ponent parts of a given magnitude Extensive use of vertical 
or horizontal bars is made by the United States Bureau of tl * 
Census m the Statistical Allas of the United States one of which 
was issued in 1914 and another m 1924 In addition many 
modem writings especially in the fields of the social sciences 
attempt to portraj bj charts the statistics it is desired to present 
for popular reading 

•C/ Croxton F r and Harold Steiv Graphic Compansons ) } Bars 
Squares C rcles and Cul os Jonmalofthc Irnertcan SlaMical Anoaalton 
\ol ‘>7 (J<13'’) jp o»-ra 



PHBBENTATION OF STATJSTICS 


105 


Figure 16 is a gi’aphic portrayal of the budget expenditures 
of the Federal government, based upon legislation in effect in 
February, 1943, in which the blacked-out portion of the vertical 
bars reveals in a striking manner the expected increases from 
j'^ear to year in expenditures for war activities. 


BILLIONS OF DOLLARS 
I20t 



must Accoums 

government corporations ano agencies jy' 

(NET ENPENDITURES) 


OTHER ACTIVITIES t/ 
INTEREST ON RUeUlC OEBT 


TOTAL 

EXPENDITURES, 

GENERAL 

AND 

SPECIAL 

ACCOUNTS 


WAR ACTIVITIES 


1942 1943 1944 

Fiscol Toots »- 


> Transactions in checking accounts. 

2 Includes statutory public debt fetireraent. 

Tig. 16. — Budget expenditures of the Federal government, based upon Icgiblation 
as of February, 1943. (TAe Budget of the United States Government.) 


The use of horizontal bars is illustrated in Fig. 17, which 
shows graphically the statistical data in Table 4. The differences 
between distriliution of income among nonrelief families in 
metropolitan areas as compared ivith that among families on 
farms is seen at a glance, and a slight scrutiny of the bars brings 
out the less dramatic but clear differences in the distribution of 
income in small cities compared with that in the laigei ones. 

Another government publication contains data, shown in 
Table 6, from which charts were drawn that illustrate the use 






level This makes possible the visual comparison of the average 
total farailj expenditure at vaiious income levels For example 
at the income lev el of S2,000 to S2,500 the aggregate familj 
expenditure averages a little over 82,000 At the same time 



PRESENTATION OF STATISTICS 


107 


the amount spent for various purposes can be seen from the 
differently crosshatched parts of each bar. Throughout the 
bai-s-one kind of crosshatching represents a specified kind of 

Income level 
JfS.OOO 20,000 
[0,000 15,000 

5.000 (0000 

4.000 5000 

3.000 4 000 

2.500 3,000 

2.000 2,500 

1.500 7000 

1 000 1.500 

500 1.000 

Under ^500 
Average all levels 


g}Food iiKt>us.nq ^Sovoqs 

Ncie Taxes shown here indude only personal income taxes, poll taxes, ana certain pensonol property faxes 

Tig. 18. — Use of income by American families at different income levels, )llns- 
trating the use of bar cliagiains. (Based on Table 0.) 

Income level 
i 15.000 20,000 

10000 IS.OOO 
5000 10.000 

4^)00 5000 

3000 4,000 

2.500 5,000 

2,000 2,500 

1.500 2,000 

1000 1,500 

500 1,000 

I5ider»500 

Averogsall levels 

m^ooi iIlnous,ng ^Clolhng □Au'e-eixle m°J!Z,p^.cn^t%°'''‘ 

ttofe.TatessJvswnhere tndude only pcrscnalinccnie <a*es.poII tcues ard ccr*3'.n p€f:inalpfCT>e*ry 

Fig, 19. — Percentage use of income by American families at difTorent income 
levels, 1935-1936, illustrating the use of 100 per cent bar diaRrams. (Based 
on Table 6.) 

expenditure. The second desirable comparison is .still more 
quickly grasped by the use of 100 per cent component part bars, 
which Is illustrated in Fig. 19. ^^Tlen such a chart is drawn, 








l\7l:OM C1W\ 


!0S 

It Ls alnn.N^ achi'.ililr <« Wv-jm n.u{cr> lint IW i>or C(nt bar 
rharl- arc U-inR U'IhI; ni mlditicm, the table of actual figure^ 
should Ik* given for the actual fipires are completely concoale<l 
in the rchtivc figtires if only the chart h pi^cn. It ttiJl lie 
nolicwl that clever arransemcnl of c«>-*hatchinp, placing con- 
trasting t\iKs adjacent to each other, aids grcatlv in the reading 
of the chart. PigurcK 20 and 21 are interesting uscs of the bir 



tia 20 — V(n»tion in fxiwnJifurr* «ilh inrontr, lllustrstinic the uv> of a crows* 
hkUhccl lone iltSKram |\olioti4t JUt^vrtri ComnMtt, Conrumcr ^xpcncJi/urri 
\H ihf Vnitf>t Slatei IKU-lO'ie (1010) pp ICS-ICO] 

chart, lirtually in the form of zones, to show the distribution of 
the consumer food dollar on the assumption of four difTcrent 
total national income levels. The hame data arc shown in 
rig 21 in the form of a 100 per cent Inr or zone chart. The use 
of the zone cilcet has the adiantape of niduig the 030 to make 
the principal mdicattsl comparisons. 

There are man3' examples of the ijsc of Kxrtors of circU-s in 
the Stahstfcol .l/Zo-t e/ fhr Vnilett Slated, census of 1920, and a 


PRESENTATION OF STATISTICS 


109 


number in the publications of the census of 1930. Figure 22 is an 
example of a single circle divided into sectors representing 
component parts in the utilization of milk in the United States 
in 1929. As in the case of the component bar charts, so also 
in the case of sectors of a circle, it is possible to represent changes 



.N\\^ 


‘\\V 

v\\\ 

100 


X\\\ 

_ '\\\ 
'^w 

\ ^ \ 

V\\N 

S,N\\ 


Food 

v\\\ 

V\\N. 

^\\\ 

- ^N\s 

"in 

Ifll 

V\\s 

\VvV 

WNN 

ANN' 

. s.\\' — 
NNNV 

\sn'^ 

m 



i ■ 
1 


Housing 


m 

llil 






HP 

Household operation 

: 



■L^k 




K 

Clothing 



H 

ySR 

Automobile 

p 

^1 

lllllll 


Other items 



Ml 


Gifts and taxes 




Savings 




wM 




50 60 70 80 

Site of income.billions of dollars 


Fig. 21. — Variation in percentages of various expenditures ivitli income, 
illustrating the use of a 100 per cent crosshatched zone diagr.am [^aliona! Re- 
sources Committee, Consumer Expendihircs in the United States, Estimates lOSo- 
1930 (1939), pp. 1C5-1GG.] 

from time to time in percentage components by the use of a 
series of circles. It is not advisable to use the sectors and 
circles as bars were used in Fig. 21, namely, to picture relative 
change and total change simultaneously, lo do this with 
sectors and circles involves areal comparisons that are not 
grasped by the readers of the charts. In Fig. 23, vhich is 
presented to illustrate the use of sectors and circle.s, the attcmjjt 


no 


J\TJ{ODVC110S’ 


has been made to show also such an areal comparison 
most people would see at a glance that the circle for the End of 
1938 w smaller than the circle for the End of 1930, presumabh 
to indicate that the total United States long-term in%estments 
in foreign countries was smaller m 1938 than m 1930, few could 
6ec from the areal companson of the circles how much smaller 



tiG — Ldtixadon ot rniiic in the United StaicH lOSif lllustmtiue tho uie 
of sectont of circlea Bwd on talue (FiftctntK C<n*u» of th« Vnxttd Statu 
!!)30 I of 4 AarteuUurt) 

Perhaps it is sufficient to ha\c the smaller 1938 circle call atten- 
tion to the fact and then assume that the reader will be led 
thcrebj to note the figures, which art shown m a separate table 
But the figure shown m each sector of the circles is a component 
percentage and docs not throw light on aggregate amount 
For the purpose of showing graphically the component parts 
of a total, the split-bar chart is a promising new device I igure 
24 illustrates its use to show the distribution of the consumer 
foo<l dollar Comparison between <»ns«mer dollars of laning 



1.00 DOLLAR 


PRESENTATION OF STATISTICS 



END OF 1930 


END OF 1938 


— CANADA 8 NEWFOUNDLAND 
^3 EUROPE 

LATIN AMERICA 
^3 REST OF WORLD 


Fig. 23. — The United States’ long-term investments in foreign countries, end 
of 1930 and end of 1938, illustrating use of cireles of different sizes. (Bureau of 
Foreign and Domestic Committee, "The Balance of International Payments of the 
United States in 1938,” p. 49.) 


TO retailers 


TO WHOLESALERS 


TO TRANSPORTATION 

54 


TO PROCESSORS 
204 


TO FARMERS 


Fig. 24. — Distribution of consumer food dollar, 1935, illustrating use of a 
split-bar chart. [National Resources Commiltee, The Structure of the American 
Economy, Part I, (1939), p. 68.] 



CONCENTRATION OF WAGE EARNERS IN MANUFACTURING INDUSTRIES, 1935 

(200 COUNTiCa WITH LARGEST WUMSER OF WAGE EARNERS^ 



114 


I\TRODUtT10S 



ll o Umtpj 1930 



NON-PAR BANKS 

OCCeUBEIt 9l.t*3t 


115 



I'lo, 27. Nonpar banks in the United States^ Dec, 31, 1938, illustrating use of the point-dot system. {'*F('deral Reserve 

BxtUflin^*' February, 1940, p. 94.) 



PERCENTAGE OF MORTGAGED OWNER*OPERATED FARMS WITH 
INTEREST RATES OF 6.5 PERCENT OR OVER ON 
^^aj-5-„^FIRST-MORTGAGE DEBT. APRIL!. 1940 


IIG 


I\llMIiVCUO\ 



o{ dpnsiiy crossJiatdung (Source 



PRESENT Al’IOX OF STATISTICS 


117 


dots bliould be of sulRcieut lelativc .size ^-o that there will not be 
too many of tbem. An example is .shown in Fig. 26. 

The chief difficulty in the use of this kind of cartogram is the 
mechanical one of arriving at the proper magnitude to assign 



to each dot of uniform size. If the magnitude assigned to each 
dot is too large, it becomes difficult to show graphically the 
small quantities relating to geographical locations where the 
characteristic is scarce. On the other hand, if the magmtude 
assigned to each dot is too small, this results in too great a crowd- 
ing of the dots in areas where the characteristic is very plentiful. 



118 


IMR0DUCT10\ 


In Fig 2b, thi'J is illusti-ited bj the attempt to picture the 
%oIiime of ^\ho!esa^o trade of the state of Xc« York, compared 
^v^th the rest of the country Tlie dot-* are so dense that it is 
hardh possible to count their number AA’hile the general 



jncture of relatiip density j5 guipJLlj. ijjjjoluied/jxuioimch a 
this purpose can be better ser\cd bj the use of the point dot 
map Another objection to the dot of uniform size map for 
this particular purpose is that it may con\ ey the impression 
that the concentration of a^holcsalc trade is o\er the whole 
state of New York, whereas it is known to be concentrated m the 


119 



Intcnitnto Coninierw Commission. 


120 


WTRODUCTIOV 



Fig 32 — Distribution of rubber manufaeturine in tlirco leading states in 1937, 
illustrating a dramatic use o{ a point-dot map {Reprodueed from Barirr, P 71'’ . 
andE G Holt Rubber Indu$tri/ of Iht Untied Stait* 1038-1939 {Bureau o/Fomon 
and Dorneitic Commeree), p 20] 




PREHENTATION OF STATISTICS 


12 ] 


metropolitan area of New York City. This conception is 
more cleai'ly brought out lij' the use of the device of using large 
dots of varying size. 

The third tj-^pe is the point-dot cartogram, in which each dot 
means a certain quantity, but the dots are .so small that they 
cannot be conveniently counted. The significance lies in pre- 
senting the idea of relative density of dots. Figure 27 shows the 
concentration in the Southeast and the Northern Middle states 
of nonpar banks of the United States. 

Cartograms hy Colors and Shades. Obviously, the same effect 
can be produced b}' the use of colom and shades as by the use of 
dots, but the former are expensive to reproduce in print and 
therefore are not extensivclj' employed. The Statistical Atlas 
of the eleventh and twelfth censuses of the United States 
contains numerous such cartograms. 

Cartograms hy Crosshatching. Making comparisons relating to 
geogi’aphical location by ero.sshatching map.s has increased in 
]3opularity during recent yearn. It is more effective than the 
method of dots and is cheaper than coloring and shading. 
Figure 28 makes it easy for the reader to visualize the variation 
in different parts of the United States in the proportion of 
mortgaged owner-operated farms paying rates of interest as 
high or higher than 6.5 per cent. Figure 29 shows at a glance 
the variation from state to state in the percentage increase in 
nonagricultural employment from 1940 to 1943. 

Figure 30 is an interesting experiment in the combined use 
of a map and bar chart to .show variation in the percentage 
increase in manufacturing employment in various metropolitan 
areas from 1940 to 1943. Figure 31 shows the use of a map and 
bars to depict flow of freight traffic in the United States. In 
Fig. 32 the geographical concentration of the rubber-manu- 
facturing industry in three states of the United States is 
dramatically emphasized by showing outline maps of only those 
three states. 



CHAPTFR \ 

STATISTICS— A STUDY OF VARIATION 


Ubiquitous Vanabihty Onl> in the abstract sense is thcie 
such a thing as a fixed quantitj m all eases uith reference both 
to physical and to psjchic things practical quantitatixe expres 
Mons arejianables Ho\te\er fixed the true quantitj may be 
no human measuring device is capable of gn ing the exact 
quantit\ hence all measurements obtained ar c approx i matioa s 
In both physical sciences and i?ocjal sciences the ran materials 
amenable to the techmquea of statistics are q iiantitati\_el.y 
expressed sanations The methods of analjsis aie hkcl> to bo 
complex uhen the scientist is faced wath complex variabilitj 
This fact for the social sciences is iccogmzed in the following 
quotation * The social scientist is limited bj the fact that ho 
does not deal with rational material but wath the xationaLand 
irrational conduct-of man The host of \ariablos which this 
fact intioduces multiplies the obstacles to his work and s ets 
limits to the applicability of results 

USE OF SYMBOLS 

Simplification of the complex methods that need to be used m 
statistics Is accomplished bj the use of S3nibpls Because sym 
I ols are used for x arious purpo es beginnerB maj have a natural 
p jchological reaction unfax orable to the studj of statistics 
The immitiated may be mystified and frightened away from the 
subject on account of the s>mbohG presentation It is impor 
tant therefore to realize that the sj mbols used in statistics are 
quite simple and that there aie not \crj manj of them Tur 
thermore Thej are easilj learned and^^cmcmbe^ed as soon as 
their jeaJ purpose nf jUjm^JSaatJOO js wndejstpod 
•Fosdick Raymond B I Itev rw for 19Z9 — It t Rockefeller Foundation 
pp 41-42 This foundat on eontrbates extensneJj to the support of 
research in manj sc ent fic fields for example it contnbutes to such research 
orgamzat ons as the Brookings Institutoa and the Natonal Burca i of 
J-ronont e nesetreh d seiissed in Chap III 
122 


STATISTICS— A STUDY OF VARIATION 


123 


The Variable X. The stud3’' of variation is the meat and 
bones of the craft. The variable X is not a new idea to anj'one 
who has gone as far as a first course in algebra and who has on 
man5^ occa.sions said, “Let X equal . . . Sj^mbols enter into 
statistical analj'^sis in onlj' three wa5's: 

1 . To represent variation in size .mth time; in such a case the 
data measuring the variable are designated “time s eries.” 

2 . To represent variaiionJiijQrdenJDLniagniiiic^ from smallest 
to largest, or vice versa (if time is involved, it is disregarded, as 
the variable is rearranged or reclassified upon the basis of mag- 
nitude); in such a case the data measuring the variable are 
designated “frequen(at.series.” 

3 . To represent variation in quality or attribute (for example, 
occupation, geogi’aphical location, or race). 

In sj^bolic language, it is purely a matter of convention that 
the variable may be referred to as X or as Y or as Z. In a given 
problem, if the nomenclature of X is assigned to a given variable, 
it is necessaiy to retain that s3''mbol for that particular variable 
throughout the problem. In the theor3’' of statistics conventions 
have arisen as to the use of s3Tnbols; for example, variables are 
commonLv_de>sign ated bv the letters at the encLpf the alphabet, , 
while constants or known figures are designated b3’’ the letters 
, at the beginning of the alphabet. 

One convention widel3'' followed is t o use a bar over a letter 
to designate the arithmetic mean, so that Xi (read “Xi bar”) is 
the S3’^mbol for the mean of a series of X’s. Another group of 
X’s would be X/ and their mean X,-. The subscripts i and j, 
respectivety, symbolize subgroups. For example, all the A^’s 
ma3’’ I’efer to the I.Q.’s of college freshmen; X,- refers to the I.Q.’s 
of male freshmen; and X,- refers to the I.Q.’s of female freshmen. 
Accordingty, X,- S3"mbolizes the mean I.Q. of male freshmen, and 
X,- symbolizes the mean I.Q. of female freshmen. It is then 
conventional to designate the mean of all the X’s, both X,- and 
Xi, as X (caUed "X bar”). 

Another commonty used convention is to designate an esti- 
mated figure by a letter followed b3'' prime. According to this 
convention if an estimate is made of the value of X (for example, 
the coming crop 3deld of wheat based upon repoi’ts to the United 
States Department of Agriculture), the estimate is S3’^mbolicall3' 
designated X'. Similarty, if an estimate of X (the price of 



124 


INTBODUCIIOV 


uheat for example) is made from information on supply and 
demand data it is called A ' The small Greek letter sigma (j) 
IS used to designate standard deviation ' A special estimate 
of the standard dcMation is sjmbolized bj a 

It 18 a common practice to use certain other Greek lettcis to 
sj mbolize statistics Accordmglj fn fit ta fin (the Greek 

letter mu) symbolize the senes of statistics called moments' 
about a mean of a sample The symbol ir refers to the constant 
3 1416 The symbols 1 1 Vt (the Greek letter nu) refer 

to moments about an arbitrar3 ongin 

"While the use of symbols has become fairly i\ell standardized 
in some respects along the conventional lines indicated complete 
umformitj and consistent sjstcmatization are far from realized 
Even the simple conventions abo\c enumerated are not uni 
versallj folloucd No\crthclcss the student \m 11 find it an 
advantage to have his attention directed to these trends m 
8> mbohe representation 

TIME SERIES 

Conientional Use of \ and T to Symbolize Passage of Time A 
convention in times series analysis is that Y is used to refer to the 
passage of time T is also used for this purpose * It happens 
that the same symbol A is conventionallj used m geometry tng 
onometry and the like to refer to the horizontal axis in a plane 
The unification of these tno conventions results in the convention 
in statistics that m making graphs of statistical time series the 
X axis (the honzontal axis) is used to represent the passage of 
time Thus the passage of time maj be indicated b\ a senes of 
A„ as shown m Fig 33 where A refers 

“ I 1 r 

1941 1942 1943 
A, Y, \, 

Fib 33 

or as shown in Fig 34 where \ refers to months 


Jan Feb Mar tpr 

Y A, A, Xi 

Fic 34 

• For further discussion of tl e standard deviation see Chap \ I 
*See Chaps \I\ WH 


A s Ai A. Xi 
to years- 

1940 

Y 



STATIST1CS~A STUDY OF VARIATION 125 

As indicated, I’l, 2V, J’a, - . • , 2’„ may also represent the passage 
of time. 

LoAver-case letters aTand t refer to deviation from the mean; 
that is, X - X = x; T - T = i. 

Where the Variable Fluctuates in Size with Time. IVhen the 
statistician is dealing with a variable, that fluctuates in size with 
the passage of time, he refers to this variable as Y. This is a 
convention; there is no logical reason for it except that he has 
already used the sj^bol X or T to refer to time and wants to 
have a different symbol for the variable being studied as it fluc- 
tuates through time. This situation is described in technical 
language by saying that the variable is a “ function ” of time, by 
which is meant merely that, as time passes, the variable fluctuates 
in magnitude, one Avay or anothei*. The simple sjmibolic way of 
sajdng exactly the same thing (where X refers to time and Y 
refers to the variable) is 

Y - F(X) 

There is nothing m 3 ''Sterious to be read into this expression. It 
is merelj’' a use, slightly different from the ordinary one, of the 
cqualitj’’ sign ; and the whole expression means that F is a func- 
tion of X, or the variable which is being studied is a function of 
time, meaning that it fluctuates with the passage of time. This 
may be illustrated bj'' one or two examples, imaginary figures 
being used. 

Time P.\sses in 1944 

The unit that constltute.s the variable i.s the price of sugar per pound in 
the New York City market (average for the month of prevailing daily prices). 



X 

J 


Xi 

January 

Fi 

3 cents 

Xj 

February 

Yz 

2 cents 

X, 

!March 

Yz 

4.3 cents 

Xi 

-\pril 

F4 

5 cents 

X 5 

Alay 

Fs 

4 cents 

Xc 

June 


2.8 cents 


■ Thus Xi is the first unit of time (JanuarjO> and Yi is the 
measurement of the variable Y at that time according to the 
designated unit of description; in other words, Fi is the price in 
January. Similarly, Fa is the price in February (Zj, or the 
second unit of time), and so on. 



126 


INJKODUCIIOS 


The unit of time maj be tlie week, os where the unit that constitutes the 
%anable IS the amount of rainfall m inches in I^ev\ York Cityperweek 

A Y 

First week 0 1 inch 

Second neck 4 0 inches 

Thud week 0 3 inch 

Fourth neck 0 7 inch 

In this illustration, A i refers to the first v. cek, Aj to the second 
\\eek etc , wlule Fi refers to the inches of rainfall m the hrst 
week, y^to the inches of rainfall m the second week, etc 

The unit of time may be the year ns where the unit that constitutes the 
\ariable is the net worth of a business enterprise on Jan 1 of each year 

A 1 

1936 $20 001 00 

1937 $2S 546 CO 

. 193S $21 527 00 

1939 $20 250 00 

1940 $27 430 00 

1941 $35 240 00 

It 18 customarj m geometry, trigonometry, otc , to let the 
vertical axis represent the I a anable, fluctuations m Y are shown 
by vertical distances The unification of this custom tvith 
statistical presentation results in the contention that, when a 
graph 13 made of a t aiiable that is a function of time fluctuations 
in the Y tanable are shown bj vertical distances while time 
change is indicated along the A axis, or honzontallj 

Figure 35, showing comparatite changes m cosh farm income 
farm mortgage debt, and value per acre of farm real estate for 
years 1910-1942, is an illustration of the graph of a time senc'i 

Careful Description of Lnits Involved One or two mattcis 
concerning the units invohed m time senes should be noted 
Sometimes the variable refers to an average value over a specified 
period of time, m the first illustration above, the average price 
of sugar per pound in New York Citj is an av erage ov er a period 
of a month In other instances, the variable refers to a total for 
a given period of tune, m the second illustration above, the inches 
of rainfall are given by totals per week In still other problems, 
the vanable refers to a quantity at the beginning of a period of 
time or at the end of a penod of time, in the third illustration 
the net worth of a business enterprise on Jan 1 of successive 



STATISTICS— A STUDY OF VARIATION 


127 


years was used. In Fig. 35, cash farm income is in totals for 
calendar years, each year’s total being e.xpressed as a percentage 
of the average 1910-1914 5 ''earh’^ income. Farm-mortgage debt 
is in amounts as of Jan. 1 each year, e.xpressed as a percentage 
of the average 1910-1914 annual amounts. Value per acre of 
farm real estate is in amounts as of Mar. 1 each j^ear, expressed 
as percentages of the average 1912-1914 annual amounts. 

It is important in connection noth the study of time series to 
know exactly how the variable is being used. Of equal impor- 



« S OCfARtHCMT Of A««lC«l.Ty*t QF ACmCttUTWBAV ECOHOMJCS 

FiCj. So.- — Cash farm income, farm-niortgago debt, and value per acre of farm real 
estate, index numbers, United States, 1910—1942. 


tance is it that exact indication of this should be given. Every 
good statistician invariably indicates either in titles of tables or 
in footnotes just what his variables mean. He should do this 
no matter how expert a statistician he is and no matter how clear, 
without such explanation, his work may seem to him. 

^ Cuitvulative and N ojicumulative Data. Another important 
matter is the difference between cumulative and noncumulative 
data in time series. The fundamental distinction between 
cumulative and noncumulative data is really the _differen^ 
between j^ta of “condhionJ’ an d data of cha nge.” Cumula- 
tive data are the data of change. It is possible to add the data 
on weekly rainfall and thus obtain data on monthly i ainf all 



I2S 


t\TRVMC1IO\ 


orjrarh ramfnll Salt- of a >* 01 ^ b% tbn wctk can Ijc nildrd to 
Rct *!nlt- h\ tlic month or b\ the >car It i* pos ihlp to ciimulntp 
tlic numlKT of hirtks dail} m ordor to pi t the total niimlxr of 
t irlhs ixT month or tw >eftr lowme ami ontpo hptJtvs nro 
turmililnc <hta To fhc'^e "ho hue -tudiM amKintinp a 
coJucnKnt am!op\ is to the profit and lovs f^tntement— fiptn- 
m the profit and lo-s htntemcnt the mam rt prt-ent cumuhtn e 


<!ata 

Noiicumul itut <latii art tht*-e dt-rnhinK a eonditum and art 
not subject to the atlditne treatment The nteripe pnre of 
supar per n cek cannot l>c cumtilatcil to obtain tlic a\ 1 mpt price 
of sufTir i>cr month or ikt \e‘rr It *•* ntct->ir> to resort to 
avenitnnK The dadj lipices on iwpnhtion cannot K added 
in onler to get tht monthh 



i I 1 1 art f A t f »«r><^ 


iwpulition figures A hilince 
of $3 000 m the hank in hiniiarv 
and of S') 000 m Mnreh do not 
guc jou a balance of $S 000 for 
tho tno months These arc 
Items of condition and cannot Ixi 
iddfsl In onhr to obtain sig- 
mfirant Mimmari re'ults m tho 


ei.< of tioncumiilitni «lata o'‘r M%cnl penods of time, it h 
necossan tonviragi nither than to aild 

IIic mitluKl of luiricmg is applicable not onK to the non* 
ciimu! iti\( hill to the ciimulati't tj|>cof data It is significant 
to ‘•pt »k of th( aiinigi daih rmnfali during a guen month or 
icir or the n\irip< iicakb rainfall during i gi\cn month or 
Mar or till niiragi luekh sab's of a guen > ear, etc 

\nothcr «ai of referring to a tune “tnes ii to di-cnlMj it as the 
‘iliiation m which a sariable >s clav ifietl according to the time 
of It" oceiirrence Tho buw of cKs>ifieation is time, and the 
nn> t kpical arrangement of the dati in question is that hiL«is 
Vs will l>e pcen the <1 ita of a time Mfies ma\ be rcclas«ifed for 
eirtua a and •J.Iwn tUw is done 

thc\ no longer con'-litiite a iinW «nes 

Chariitig Tinir Srnrii ^\licn *1 time pirn's is graphisl the 
\ a\is IS iwsl to nj n-cnl pi-snge of lime whih tin 1 -axis is 
u*ixl to rtprt-tnt lining mapmtiides Tlut« m Iig 'll* the 
jKintx plotfisl wouhl repn-tnl n magnitude equal to 2 m IkfO 



HTATmnCti—A STUDY OF VAHIATIOX 


129 


equal to 1 in 194:1, and equal to | in 194:2, rising to l\ in 1943. 
It Is conventional to represent time series by lines or curves 
connecting the plotted points. In graphic phraseology these 
lines may be draum through the plotted points as polygons (e.g., 
Fig. 36), or the changes in direction may be curved. 

Two kinds of charts are in general use for the graphic pi’esen- 
tatron of time series: (1) arithmetic charts and (2) ratio charts. 

ARITHMETIC AND RATIO CHARTS 

Arithmetic Charts. The arithmetic chart pictures arithmetic 
changes in magnitude. For illustration, in Fig. 37 is shown a 



Fig. .37. — Con.sUint growth and con- Fig. 38. — .Showing effect of 

stant rate of growth. omitting zero line. 


variable magnitude represented by the line A A', increasing by 1 
during each time interval. This produces a straight line. On 
such a scale any variable increasing at a constant rate would give 
a straight line; but any variable increa.sing at a constant relative 
rate would produce an e\ ei-steeper curve. This is illustrated by 
BB', which shows a magnitude doubling in each interval, that is, 
increasing at a constant rate. 

The significant comparison in such a chart is alwa 3 ^s with zero, 
and hence the zero line should invariably be included in the chart. 
Leaving out the scale between zero and the point Avhere the 
curve reaches its lowest point wUl give a deceptive appearance 
to the changes that occur. This is illustrated in Fig. 38, where 
Pz is really larger than Pi (see scale) but appears in the figure 
to be twice as great because only part of the vertical scale is 
shown. 

An arithmetic chart may also be a graph of relative figures, 
in which change from time to time relative to some base is 



]30 


ISTRODUCIIOK 


pictuied Such a graph is Fig 35 In tin'* kind of graph, tho 
base lb usually called arbitrarily 100 per cent and the relative 
changes above and below that base are graphed as percentages 
of it Figure 39 shows a magnitude at 105 m 1941 (5 per cent 
above the base), at 95 m 1942, at 90 in 1943, and at 105 again in 
1944 It IS an extensive practice to convert time series into 

no 

105 
1 100 
b 95 
90 

esl 

1941 1942 (943 (944 

1 IQ 39 — Ch^rt «f time s«ric< in relatives 

lelatives, using some particular point m time as tho base, and 
when such relative senes oi ‘ indexes” (as they are sometimes 
called) are charted, the chart assumes the form indicated in 
Fig 39 The point of departure for reading such a chart is the 
100 per cent line, which should l>c emphasized — the zero point 
does not have to be shown on such a chart The relatu e chai t 

no 
too 

90 
60 
TO 
60 

I II. 40 — The percentage changes in the pnees of 354 industrial stocks (1935- 
1939 = 100) {SuneyofCurrenlBaaxneu teklg Supplemtnl Apr 29 1943) 

ftWiVA Wi, W tvVAi tass* aii avVadri Taxv ilralti 

already in the percentage form and the zero per cent may be 
the significant point of departure rather than 100 Thus the 
raw data may be percentages of population paying income taxes 
in successive years In such a case the zero line is important, 
the raw data themseh es being m percentage figures 


Wedntsdaydose 








/• 



\ /A 


r 



Vv/ ^ 













STATISTICS— A STUDY OF VARIATION 


131 


Figure 40 is an illustration of a graph of a relative time series. 
It shows the month-to-month variation, compared noth the 
average of monthly figures, 1935-1939, in the prices of 354 
industrial stocks; thus the average 1935-1939 equals 100 per cent. 

Ratio Charts. The second type of chart for graphing time 
series is the ratio chart, which is designed to picture relative rate 
of change. According to Wesley C. Mitchell, the idea of the 
ratio chart was introduced bj'- Jevons in 18G3-1865.* But the 
ratio chart did not come into general use until its advantages 
were explained by Prof. Ijnnng Fisher and James A. Field, in 
1917.2 



The great popularity in recent 3 'ears of the ratio chart has 
been largelj^ due to the fact that special graphing paper has 
been made for the purpose, the work of making such a chart being 
thus vastly’" simplified. 

In the case of the arithmetic chart, equal rises on the chart 
per unit of time I'epresent a constant rate of increase — in the 
case of the ratio chart, equal rises per unit of time represent 
a constant relative rate of increase. This is illustrated bj'^ the 
comparison of the left with the right scale in Fig. 41. This 
figure is a simple illustration showing a magnitude changing 
at the same relative rate, BB', and a magnitude changing at 
a constant rate, AA', both plotted on a ratio scale. The BB' 

1 Mitchell, W. C., Business Cycles, p. 209. 

2 Fisher, Irving, “The ‘Ratio’ Chart for Plotting Statistics,” Publica- 
tions of the American Statistical Association, Vol. 15 (June, 191/), pp. 577— 
601; Field, James A., “Some Advantages of the Logarithmic Scale,” 
■lournal of Political Economy, Vol. 25 (October, 1917), pp. 805-841. Cited 
from IMitcljell, op. cit.. p, 209, 




132 


IKIRODUCTIOS 


magnitude doubles m each tune penod The AA* magnitude 
increases in each time penod by the constant difference of 4 

Notice the scale of loganthms at the right, uhich corresponds 
to the scale of natural numbers at the left These loganthms 
are to the base 2 Thus, the log* of 64 is 6 because 2* = 64, 
log* of 32 is 5 because 2® = 32, etc It is e\ ident, of course, that 
uhile the scale at the left is m geometnc progression the scale 
at the nght is m anthmetical progression This i-s a character 
istic of ratio paper Ratio charts ha\ e no zero line, and there 
IS no point of emphasis The attention is directed to the shape 
and fluctuations m the curve In the case of the anthmeticalh 
ruled chart, grouth at a constant difference is a straight hne — 
the greater the difference, the sleeper the line-^ — but it is still a 
straight line if the difference is constant In the case of the 
ratio chart groirth at a constant relative rate is a straight line — 
the greater the constant relative rate the steeper the line — but 
It is still straight * 

On anthmetical paper changes, m differences produce curves 
or irregular lines On ratio paper, changes m relative rates of 
change produce curves or irregular lines The vertical scale of 
the anthmetical chart is an anthmetic progression The v ertical 
scale of the ratio chart is in geometnc progression, but the 
loganthms of the natural scale on a ratio chart are m anthmetical 
progression For thes reason, the ratio chart is often called the 
semiloganthmic chart One method of plotting a ratio chart is to 
find the loganthms of the raw data and then plot the logarithms 
on anthmetically ruled paper The results are the same as if 
the natural data were plotted on a ratio scale The labor of 
looking up loganthms is avoided bj having the scale made into 
a logarithmic one, upon which the plotting of natural data vnll 
produce the same effect as if the loganthms were found and 
plotted Thi-s is shown m a verj simple case m Fig 41, m 
which the scale in logarithms is at the right and the scale in raw 
data units is at the left As already explained above, the hne 
BB represents a vanable that increases at a constant relative 
rate, while the line A A' represents a vanable that increases by a 
constant quantitj In Fig 41 the v ertical distance betw een each 
of the scale markings on the left represents just double the 
absolute amount of the same vertical distance immediately 
below it and just half the ab^mlute amount of the same v ertical 



STATISTICS— A STUDY OF VARIATION 


133 


distance immediatelj" above it. In this figure the variable that 
doubles every year follows the straight line BB' (it was a cur\^e 
in Fig. 37). A variable that increases by the same aggregate 
amount each year and hence follows a straight line in Fig. 37 
would follow a cunmd path on a ratio chart, such as line AA' of 
Fig. 41. 

Since the logarithm of the ratio between two quantities is equal 
to the difference between their logarithms, ratio paper can be 
easily ‘"calibrated” by the use of a logarithmic scale. Thus, if 
equal vertical distances are taken to measure equal aggregate 
differences between logarithms, then these same vertical distances 
udll represent equal relative distances (equal ratios) between the 
antilogarithms of the logarithmic scale. In Fig. 41, for example, 
the unit vertical distance is taken to be a unit difference between 
logarithms to the base 2, and the logarithmic scale on the right 
reads, 2, 3, 4, etc. Since the antilogarithm of a number to 
the base 2 is equal to 2 raised to the logs power, the antilogarithms 
of the logarithmic scale become 1, 4, 8, 16, etc. This is the 
scale shoum on the left. It is evident that while the scale on 
the right is in arithmetic progression the scale on the left is in 
geometric progression. Accordingly, if paper is ruled so as to 
be in arithmetic progression "with respect to some logarithmic 
scale but is marked or calibrated in tenns of the antilogarithms 
of the logarithmic scale, any variable plotted on this paper in 
accordance vdth the antilogarithmic scaling vdll indicate a con- 
stant rate of groudh or decline Avherever it traces out a straight 
line. 

Most ratio paper is ruled in accordance Avith a logarithmic 
scale to the base 10, since this is the base of common logarithms. 
An example of this kind of “semilogarithmic paper” (as it is 
often called because the vertical scale is logarithmic while the 
horizontal scale is arithmetic) is shown in Fig. 42. The reason 
common logarithms are to the base 10 is that numbers are 
arranged upon a decimal sj'stem and, by taking the base 10 for 
logarithms, the integral part of the logarithm (characteristic) is a 
mere recod’d of the position of the decimal point in the original 
number. The number 10 raised to the zero power is 1, and so the 
logarithm of 1 is zero; the number 10 raised to the second power 
is 100, and so the logarithm of 100 is 2; the number 10 raised to 
the third power is 1,000, and so the logarithm of 1,000 is 3; and 



134 


INTRODUCTIOh 


BO on, indefinitely Likewise any number betneen 1 and 10 will 
have a logarithm (to the base 10) whose charactemtic is 0, any 
number between 10 and 100 will have a logarithm whose char- 
actcnstic is 1, etc The fractional part of a logarithm (its 
mantissa) is the same for all similar successions of similar digits 
The fractional part of the Ic^anthm to the base 10 for the numbei 
2 IS the same as the fractional part of the logarithm for 20 or 
200 or 2,000, etc , namely, 0 3010, but the characteristic of the 
logarithm of 2 is 0, the charactenstic of the logarithm of 20 is I, 
the characteristic of the logarithm of 200 is 2, and so on Thus 
the entire logarithm of 2 is 0 3010, the entire logarithm of 20 is 
1 3010, the entire logarithm of 200 is 2 3010, etc Hence, when 
the base of the logarithm is 10, l(^anthratc markings of —2, —1, 
0, 1, 2, 3, etc , repre<!ent antiloganthrmc markings of 0 01, 0 1, 
1 10, 100, 1 000, etc 

Semiloganthmic paper to the base iO is usuallj lulcd to 
represent either one logarithmic unit and the fractional parts 
thereof, corresponding to equal tenths on the antiloganthmic 
scale (called “one-C)cIe paper”) or two logarithmic units and 
the fractional parts of each corresponding to equal tenths on 
the antiloganthmic scale (called “two-cyclc paper”), or three 
loganthmic umts and the fractional parts of each, corresponding 
to equal tenths on the antiloganthmic scale (called ‘‘thrce-cyclo 
paper”) All three of thcae t>pcs of loganthnuc rulings are 
shown m the right part of Fig 42 Since the loganthmic scale 
IS in arithmetic progression, these ruling would be the same for 
any logarithm diffenng by one, two or three xmits, they would 
apply to logarithms running from —2 to 0, as well as from 0 to 2 
Thus the corrraponding antiloganthmic scale can be selected 
by the statistician in accordance with he needs If his data run 
from 2 to 800, for example, he would select three-cycle serai 
logarithmic paper and make his scale as indicated on the left 
of Fig 42 If his data ran from 200 to 80,000, he would also 
select three-ej cle semiloganthmic paper and make his scale from 
100 (at the bottom) to 100,000 at the top If his data ran from 
0 2 to 8, he would choose two cycle semiloganthmic paper and 
make his scale from 0 1 (at the bottom) to 10 (at the top) 

Figure 42 is an illustration of a three cycle ratio scale for the 
plotting of a time senes by months for C \ ears The scale as 
drawn reads from I to 1,000 but it could bt made to read from 



STATISTICS— A STUDY OF VARIATION 135 

10 to 10,000, or from 100 to 100,000, etc. At the right of the 
figure are sho^ra the three most generally used types of ratio 
scales, the three-cj'cle ratio scale, the two-cycle ratio scale, and 
the one-cycle ratio scale. If the extreme fluctuations of a time 
series are 60 and 3,000, it would be necessary to use three-cycle 



Fig, 42. — Three-cycle semilogarLthmic paper. 


paper; on the other hand, if the extreme fluctuations are 60 to 
500, it would be necessary to use only tAvo-cycle paper. 

Figures 43 and 44 are intended to illustrate the advantages and 
disadvantages of the ratio chart. Figure 43 shows the com- 
parative growth of some famous cities of the United States on an 
arithmetic scale, and Fig. 44 shows the same data plotted on a 
ratio scale. These data are also shown in Table (. It ndll be 


t\TRODUCTtO\ 


n«. 

notiwl til'll on nn nnthmctic Fcafe it n not possible io brniR tfit 
Ntn lork Cit\ population prowth cunc into the picture On 
the rntJo pijicr I'l po^^ible Of course, on the •intbmcticnn> 
ruled paper Nc" "Vork Cit> population could lx. plotted on a 
(lifTcrcnt scale but then the nnthmctic compari'on between 
\cw \ork Citj and the other cities would be lost, since the 
liciRht of the cun e from the xero line is what counts m the com 
panson on arithmetic piper 



The adiantaRe of the ntio chart is threefold (1) It makes 
possible a quick ansn-er to the question as to w hethcra mafftutude 
Is chanpnR its rate of growth (2) It cloarlj pictures the rtla- 
tiie significance of fluctuations — for example, arithmetic dif 
fcrcnces of small magnitudes appear as important as the, same 
rclatu e dilTcrences of large magnitudes On nn nnthmctic chart 
the latter would appear much larger If an nnthmctic chart of 
almost anj item of production in the Umtcil Stntcs, sa\ from 
1800 to 1940 b\ \oirs, is constnicted the fluctuations in the 



STATIt^riCS—A NTUDY OF VAIUATIOX 


137 


curve for the earlier ijeriod will be minute, while the fluctuations 
in the curve for the latter part will loom very large. In such 
case.s, the inclusion has therefore sometimes been reached that 
in.st ability is greater now than fonnerly. Plotting the same data 
on ratio jiapcr would in most cases show that the earlier fluctua- 
tions were relatively as great as or greater than the modern 
ones. (3) It facilitates compari.sons between time scries in order 
to detect correlation between them. 



1790 1810 (830 1850 1870 1890 (910 (930 1950 

Kio, 41.— -Growth of cfrtuiii cities in the United States (loBarlthmie, or ratio, 

i-calc). 

The disadvantage of the ratio chart is that it is not possible 
-to make magnitude compaiisons. For illustration, if the 
attempt were made to compare the actual size of Trenton, N.J., 
and New York City in 1930, an entirely incorrect impression 
Avould be created— Trenton Avould appear from the ratio chart 
to be about half as laT’ge as New lork City in 1940 if vertical 
distance were a.s.sumed to be magnitude. When the ratio chart 
is used, such magnitude comparisons must be made by the use 
of the raw figures themselves, which should always be given in a 
tabic of figures along ivith the chart. 




138 


INTRODUCTION 


Table 7 — Population of Spectfied Cities in the United States from 
Earuest Cevsob to 1940 
(la thousands) 



’i'renten 

N J 

PoiUmouUi 

N H 

Umal a 

Neb 

— NewYoft — 

Cilyi 

1790 


4 7 


49 4 

1800 


5 3 


79 2 

1810 

3 0 

6 9 


119 7 

1820 

9 

7 3 


152 1 

1830 

9 

8 0 


242 3 

1840 

0 

7 9 


391 1 

1850 

5 

9 7 


696 1 

I860 

17 2 

9 3 

g 

1 174 8 

1870 

22 9 

9 2 

16 1 

1 478 1 

1880 

30 0 

9 7 

30 0 

1 011 7 

1890 

57 5 

9 8 

140 5 

2 507 4 

1900 

73 3 

10 6 

102 6 

3 437 2 

1910 

96 8 

11 3 

124 1 

4 765 9 

1920 

119 8 

13 6 

191 6 

5 620 0 

1930 

123 4 

14 5 

214 0 

6 930 4 

1910 

124 7 

14 8 

223 8 

7 455 0 


Boure« S steent^ C«fxi<it ot Oi« t}iut«d 6l«t«s I^b Vol } ^apulahon pp 32 nd SBft 
■ Refers to New York Citp end lU borou(hs «s constituted in 1010 


FREQUENCY SERIES 

Dejimlian qf a Frequency Senes A convenient arrangement- 
of any set of data is a classification according to magnitude 
that \s, from smallest to largest In the case of a time sene*’, 
time seems to be the most logical and workable basis of classifica 
tion, because it seems reasonable to view things as they occur m 
time There is a rationality about such a procedure But 
another aspect of data, unrelated to time, may be important 
For example, how many different pnees of sugar dormg a given 
week differed from the a\erage price for that week, and in what 
respect did they differ, or from how wide a range of fluctuatjons 
in price during the week were the respective average weekly 
pnees calculated? This partacular aspect would have no 
reference to time, except as a matter of definition of the unit 
involved (one would not take pnees of the third week in March 
to stud} the average pnee ra the first week of March) When 
the arrangement of data according to time of occurrence is not 
Significant, it eeemsjrational to cla ssif y the data m a sen es from 





STATISTICS— A STUDY OF VAHIATJOX 


139 ■ 


smalles_t_to -largest When this is done, the resulting series of 
data is-callccLaiiJl arra y ■ ” 

Follon-ing is an exam])lo of an array:* 

Ak Arkav or 10 CtiiLDREX i.v Third Grade, by Age 

Age, 

Vrar-i 
AA 7J 

A, 7-1 

A, 7t 

A'< S 

Ai S» 

A, S5 

A: 81 

A* 9 

X, 9J 

A,o 91 

Variable A” an-angcd according to magnitude, wliere A' = age 
of children in third grarie, A*i = age of j'oungest child, etc., 
until A'lo is age of oldest child. 

TJic situation ma}’ he one where there are a number of children 
of each age, for e.\-ample; 




no 


t\jiotnciio\ 


From tlu arm , it iioticod that there m I child Oj j ears old 
and there are 2 children >cap*oId, 3 children 0} jears old, ttc 
In^'?mHeh as there are iralh onh eight lanations o/ tht 
\nriable \ some of which occur more than once the alme is 
more coin cnicntlj simimanzed as follows 

NiifBUi Of Chili Rl^ orSpfcintD \o» A%tov(. IS CiiiiiKfN is Thir» 
Cbadc 

Vge "icarg ofChillrcn Number of Children 


in Tt ird (trade 

of Specifics! Vrp 



\ 

F 


\ 

n 

\ 

h 

\, 

61 

2 


\, 

6} 

3 

fi 

\. 

7 

•4 

F. 

\, 

7i 

3 

Ft 

V, 

7} 

3 

Ft 

\ ■* 

7} 

1 

f 

\» 

8 

I 

f. 



18 - N - 

Sf 

This Is called 

a frcqucac} «enes or a ‘ frerjiiencv dislnbu 


tion the variable la listed m a column in the form of an arrav, 
and in a second column the frcqiicnciM of each variation arc 
net down It is merch a condensed form of the amj and is 
particular!} convenient, as ma} l>c rcadil} imagined, when a 
large number of ca«cs is studied It will lie noticed that a new 
svmlfol IS introduicd, but it is a verj simple one and one that 
mdil} suggests itself Ft refers to the number of times Ai 
occurs Ft the number of times \j occurs etc f «tRnds m 
general for the frcquone> of occuircnce of a variation, 18 w the 
total niiinlicr of casM and is therefore the sum of the f «, and 
this IS written iF (p, 4. + f, 4. +F„*=iF) IIow”- 

ever a more general wa} to Rjmlxilizc the total number of 
c LSI'S ts to u«e a large X I ither if or N could be used, but 
It Is conventional in stattstics to use *V to represent if lliw i, 
IS the capital Greek letter sigma, and it is nlwus's used m statistics 
iti ’ fv “fo/aJ ef ** 

Aaturc 0/ a trtquency Di$tributton anil IlluKlralion Iho idea 
of the arraj and of the frequenej distribution in its barest 

dren for exjimplf t c nht or nciidft Th I v*m must be n foinmon cl arac 
tcixstic or Attnb itc (hat is a anrublt maKoitudc capable of ()uanlitati\c 
mess imnc nt 



STATISTICS— A STUDY OF VARIATION 


141 


simplicity has been illustrated. From the example, J^t is seen 
that the (frequency distribution is merely the commonplace and 
rationaLarran^jeraentyoF-a-^t^of^data^n order of magnitude. 
As indicated elsewhere, this form of arrangement discloses a 
natui'al order that appears to persist in all things,^ namely, 
that in a large number of observations of a common characteristic 
of a thing the folloAving tendencies exist; 

f^'l. A large number of frequencies cluster about a central 
! magnitude or average, which occurs most frequently. 

I 2. Small variations above and below this central magnitude 
I are numerous. 

j 3. Large variations are much less frequent. 

4. Extreme variations are rare. 

Follovdng is an example of a frequency distribution showing 
the number of cities of 100,000 or more population that have 
specified death rates from puerperal causes: 


Table 8. — Maternal Mortality in Cities of 100,000 or More 
Population in the United States, 1938 


Death Rates 
(Number per 1,000 

Live Births) 

Number of 

A' 

F 

1- 

2 

2- 

16 

3- 

18 

4- 

20 

5- 

15 

6- 

10 

7- 

4 

8- 

6 

9- 

0 

10- 

2 


Source: Bureau of the Census, 
1940), pp. 125-126. 


93 

‘Vital Statistics,'* Special Heporfs, Vol. 9, No, 7 (Feb. 10, 


The average maternity death rate for these 93 cities is 4.8 per 
1,000 live births. It will be noted that, instead of Avriting Xi, 
Xi, Xs, . . . , for each variant of the variable X, the symbol X 
is written at the head of the column, indicating that the column 
consists of Xi, Zz, Z3 . . . X„. The symbol F is handled in a 
similar manner. Furthermore, in this illustration, class intervals 


’ ,See Chaps. VI and VII, 



H2 


INTRODUCTION 


of 1 are iJ'.ed, which is signified bj the clash after each of the 
numbers in the 1 column This is because fractional rates are 
guen m the source and not merely rounded numbers Tor 
example the death rate fiom puerperal caiiNts m 1938 m the 
cita of Vkron Ohio was 4 0 m the citv of Vlbana NY it was 
3 1 in the cit> of Atlanta Ga it was 4 4 Since the death 
lates are gnen to one decimal place if class intpr\nls wore not 
used for thp frcqucnc\ table it would require some hundred or 
more rows of figures to place the death rates m an arra\ The 
symbol for the cK«s mterxal is t In this case i = 10 decimal 
units or 1 The aierage 4 8 was calculated b% issummg that 
cases in anj class inteiwal all had the i alue of the mid point 
of the interval * 

Discrde and Continnoua hrequeney Senes A discrete fre- 
ipiencv senes is one m which the units of measurements are 
more or Ics'. fixed bj the character of the data The phenomena 
actuallj occur m such a manner that their variations in-aiza—^ 
proceed bv distinct jumps or steps The unit ofjneaauremcnt is 
fixed by this fact An example of such a senes is a frequency 
distribution of interest rates m which the quoted variations 
m rates are likcl> to fluctuate b\ i or i per cent jumps and 
there are few if anj intermediate vanations The vanation 
in the range of the actual cases is consequently bj distinct steps 
of i or i per cent Ihc vanation throughout the range is not 
b\ infinitesimal amounts flic ver> character of the data 
determines the unit of measurement and its degree of refinement 
\\ here v ariation proceeds m this manner bj discrete steps of 
consider ible magnitude as compared with the whole range of 
\ V anation it is probabl> best not to use a class interv al If the 
number of different v allies of \ that occur are too numerous for 
convenience however then the data may bo grouped into cla«s 
intervals Great care should be employed m this case to see 
that the class intcrv als are chosen so that the possible v alues of 
A are placed in a balanced position throughout the intervals 
Y-tn tfanTTi’pJi'b A xtiVats. tA X occm- uV % 4, b, %, Vic , A*iiun Vi 
grouping IS desired a class interval of size 4 might be chosen 
running from 1 up to bwt not including 5, from 5 up to but not 
including 9 etc These would balance the actual \ values 
For a more complete ihseusiiion of the class inten al and calnilation of 
averages see Cl aj \II 



STATISTICS~A STUDY OF VARIATION 


143 


around the center of each interval. On tlie other hand, intervals 
of 4, running from 0 up to but not including 4, from 4 up to but 
not including 8, etc., would re.sult in the actual X values occur- 
ring at the lower limit and middle of each interval, causing an 
upward bias if the cases are assumed to be concentrated at the 
mid-points of the intervals, as is usual. ’ If the discrete data vary 
by steps that are small in relation to the range of variation in the 
data (e.g., in steps of 1 cent over a range of SlOO), then the data 
might reasonably be treated as if they were continuous. 

A continuous series is one representing a phenomenon that 
varies by infinitesimal amounts. It may have the appearance 
on the statistical table of the same discreteness as the discrete 
series; but this is because the arbitrarily discrete character of 
the unit of measurement eclipses the actual continuous character 
of the data. In a continuous series the range of the interval 
is obtained by a process of testing and finding the one that 
appeai-s best to smooth the data, following the general rules for 
determining the class interval discussed later.- Frequency 
series of all growth phenomena are of the continuous type. 
For example, the frequ'ency distributions of weights or heights 
of people of some specified age are continuous in character. 
In passing from one height to another, the individual must 
necessarily pass through every minute difference between; and 
accordingly in measuring the heights of individuals at the same 
age (or of mature people) the variants will be by minute or 
infinitesimal differences. The units of measurement, however, 
will make them appear discrete in character. 

Charts of Frequency Distributions. A frequency table is the 
presentation of a series of variable magnitudes, usually arranged 
from smallest to largest, in such a manner as to record the fre- 
quencies of the different magnitudes. For purposes of graphing 
it is conventional to^ use the a;-axis for the variable magnitude 
and the y-axis for the frequencies. For illustration, in Fig. 45, 
the a-axis shows the variations of magnitude (death rates from 
puerperal causes in 1938) and the y-axis the frequencies (the 
number of cities of 100,000 or more population) of those death 
rates — so that the points appearing from the left to the right 
signify the following: 

' > Cf. Chap. VII. 

, = See Chap. VII. 



i\ritoDU(iio\ 


lU 

Death ratps Jii 1938 from pueip^at causra 
2 Pttfe« hae-c death rifP5 between 1 ■ind 2f>er I OOOhxchirlh-, 

16 cities liaee (loath rates between 2 and 3 per 1 000 live births 

6 Cities have death rates between 8 and 9 per 1 000 live births 
0 Cities have death rates between 9 and 10 per 1 000 live births 
2 Cities have death rates between 10 and 11 per 1 OOOlivc I irths 

Ihe points are plotted o\cr the mul points to indicate that 
the frequencies co\er the class intenal and not merelj the 
lounded quantities shown on the scale Accordinglj Pi or 2, 



IS plotted directlj o^cr 1 b,Ft or 20, is plotted directlj o\er 4 5, 
etc It IS easily seen from the figure that the peak of the fre- 
quencies IS in the inter\ al containing the average It can also 
be seen that numerous small variations from the a\ erage occur, 
but large variations from the a\ erage are few m number — that is 
the frequenej polj goii slopes rapidly downward on each side 
of the a\erago where the frequency js highest Variations of 
1 below average death rate (death rate of about 3 8) he in the 
class interval having 18 cases, variations of 1 above average 
death rate (death rate of about 5 8) he m the class interv al hav ing 
15 cases Variations of 3 below and above average are much 
less frequent — onlj 2 cases are in the class mterv al containing 



STATISTICS— A STUDY OF VARIATION 


145 


death rate 1.8, and onh" 4 ca.ses are in the class interval containing 
death rate 7.8. 

Instead of a polygon to trace the direction of frequencies, 
the practice of using bars to depict frequency distributions is 
often followed. Figures 46 to 48 are illustrations of .such graphs 
of frequency distributions. It is possible also to fit a curve to 
the points either by freehand or bj^ mathematical means and 



Per cent change in price 

Pig. ‘10. — Distribution of 017 -wholesale price itein.s by pcrcentago of pi ire 
change, 1920-1929. lHational Resources Comviittcc, The Structure of the American 
Economy, Pari I, pp. 128 and 131.) 

thus dascribe graphically the frequency distribution by a curve, 
which is called a ‘'frequenej'^ curve.”* 

In Figs. 46 to 48 it is interesting to compare the concentration 
of percentage changes in the three different periods, nameh', 
1926-1929, when prices and economic activity were compara- 
tively stable; 1929-1932, when prices and economic activitj' 
were oh the decline; and 1932—1937, when prices and economic 
activity were increasing. Figure 46, depicting the distribution 
of percentage price changes, 1926-1929, is quite sj'mmetrical, 
and the slope on each .side of the maximum frequency is rapid; 
the position of the mean (whole.sale price index for all commodi- 
■ Soe Chap. Vr. 


146 


INTRODUCTION 


ties or ~4 7) is close to midway between the two extreme lange-^ 
of the variable In Figure 47, ‘however, there is no such bjtb- 
metrv On the contrary, there is a piling up of cases in the 



-es -ts -65 -55 -45 '35 '25 -15 *5 to 5 15 2S JS 45 55 65 
Ptrcentlhcreose m price 

iiG 47 — Diitributtou of 617 prioe lUios bj percentage of price 

change 1929-1933 {Nolxon/it !i€iovn<»C«mmtiUt ThtSirvciurtofiliAmentan. 
EeoHomy Part !■ pp 128 and 131 ) 

negative direction so that the slope to the left of the maximum 
frcquencj is gradual while the slope to the right is parabolic; the 
distribution appears to ha\c a tail in the negaino direction 



55 -45 -JS -25 -IS 5fo5 15 25 J5 45 55 65 IS 85 95 !05 115 
Per cent change in price 

Fiq 48 — Distribution of 617 wholesale price iteme by percentage of price 
change, 1932-1937 {yaitoruil Rrsoureet CommUUe, Tht Structure o/ Iht Amertcaa, 
Economy Pari I. pp 128 amf 131 ) 

Figure 4S, on the other hand, shows tlie opposite tendencies, 
with the appearance of a tail extending in the positive direction 
Figuies 49 and 50 illustrate the use of frequency curses in 
chemical studies 



STATISTICS— A STUDY OF VAlilATION 


147 


Figures 51 and 52 are illustrations of the use of frequency 
histograms in biochemical studies. 

•While the frequencj^ distribution in Fig. 45 is in the form of a 
polygon, those of Figs. 46 to 48 and 51 and 52 are shown by 
outline bars. lATien a frequency distribution is dra^^^l wth bars, 
the graph is called a “histogram.” 



Citrylidenecrchnaldehyde ct 
ip-Ion^f/deneacefafdefiyde a. 
Citrytidenecrofonaldehyde a. 
semicarbazone ' 
yli-Ionyli'deneaceialdehycte a. 
semicarbazone 


Cifryfidenecroionatdebyde b. 
tj/ -lonyiideneaceiafdebyde b. 
Ofry/idenecrofona/dehyde b 
semicarbazone 
Tj/- ionylideneacefaidebyde b 
semioarbazone 


Fig. 49. Fig. 60. 

Figs. 49 and 50. — Analysis of the semicarbazone, melting point 178-179°, 
proved it to be derived from an aldehyde C15H52O, The position of its absorption 
maximum at 3250 A. and that of the free aldehyde (3150 A.) regenerated on 
hydrolysis with phthalic anhydride are in excellent agreement with the positions 
found for citrylidenecrotonaldehydes and their semicarbazones. [Burraclough, 
B., J. TF. Batty, I. M. Heilbron, and W. E. Jones, "Studies in the Polyene Series, 
Part I," Journal of the Chemical Society (London), October, 1939, p. 1551.] 


Frequency Disiribution Plotted on a Ratio Scale. At an earlier 
point in this chapter (page 131) the effect of plotting a time series 
on a ratio scale (semilogarithmic paper) was discussed. For 
some purposes the use of similar paper for the plotting of a 
frequenej'- series is desirable. Figure 53 shows the effect of 
plotting on a ratio scale the frequency’’ distribution showing the 
number of cities having specified death rates from puerperal 
causes. The frequenc)’- distribution when plotted on the 
arithmetic -scale as shown in Fig. 45 appears to be unsym- 





c aj C 2 a3 a4 oi C5 a? at at jo os -w -ci -ci o aj az w w fts 

;»b6elute Ca txcn^ Oifftnrius between greupi 

Fiq 51 Fjo 53 

Fioa 51 ftn<l o3 — Showing distribution of daily Ca eicretion for groups of rats 
Figure SI shows results of 793 dstsrmtnations of urinftry Ga (mg /lOO g /34 hr ) 
under stscdard conditioos Figure 53 shows differences (3S3 values) ^tween 
test and srbitrarily selected control groups In both cases the results correspond 
with a norraal distribution (Trus leirsit R J Blaulh-Opuntka and J 
Iieanautka Parsthyroid Hormone T5e BiocAcmicaJ /e irnof (/.orufon) TsfSS 
(1919) p 1007] 



Fig 53 —Death rates in 1938 from puerperal cau«cs {Cf Fig 45 ) 

magnitude (continuing the u'te of an arithmetic “scale for the 
frequencip'i), as illustrated in Fig 53, has reduced this contrast to 




STATISTJCS~A STUDY OF VARIATION 


149 


such an extent that the slopes on either side are almost the same 
and the frequency polygon appears to be almost symmetrical. 

An interesting application of logarithmic frequencj'’-dis- 
tribution analysis has recently been made in entomology,* by 
C. B. Williams, who saj^s: 

Mr. Yule shows that the frequency distribution of sentence length 
(f.e., number of words between successive full stops) is of the skew type 
and by comparing two different manuscripts ... he is able to produce 
convincing mathematical evidence on the identity or otherwise of their 
authorship. . . . When I converted some of Yule’s tables into diagrams 
I was struck by their general resemblance to skew distributions with 
which I have recently been dealing in some entomological problems, 
. . . which distributions, I found, became normal and symmetrical if 
the logarithm of the number w^as taken as a basis for subdivision into 
groups instead of the number itself. 

Taking the logarithm of the number as a basis for subdivision 
into groups instead of the number itself accomplished the same 
end as the plotting of the original groups on a logarithmic or 
ratio scale. 


, GROWTH CURVES 

Not all curves shaped like frequency polygons or curves are, 
in truth graphs of frequency distributions. Some growth curves 
assume shapes very similar to frequency curves.^ Figure 54 
is an illustration of a growth curve, shorving the increase in 
Chlorella vulgaris cultures over a period of hours. The tw'o 
curves contrast the peak of growth for two different-.sized inocu- 
lums; in both cases the rate of multiplication per cell varied 
inversely with the density of population, not only in the early 
stages of growth but throughout the growth period ih each 
culture. 

BIVARIATE SERIES 

Bivariates are cross classifications of two variable charac- 
teristics possessed in common by the objects being studied. 
Graphs of bivariates are sometimes confused with frequency 

1 Williams, C. B., “A Note on the Statistical Analysis of Sentence-length 
as a Criterion of Literary Style,” Biometriha, Vol. 31 (1940), Parts III, IV, 
pp. 356-361. 

“For other types of growth curves, see Chap. XX. 



150 


i\TnohUCTio\ 


ihstributions because in some their shape resembles the 
frequency cune Charts of bi\anatcs, howc^er, maj assume 
almost anj shape, and the center of the distribution raaj ha\ c no 
more importance than anj other part of it Good examples of 
bnanate comparisons max be found among the great xanetv 
of Mtal exenfs when thej are related to the different ages b\ 
their frequencj of occurrence 

Table 9 and Fig 55 present a set of such dustributions Those 
ire death hnanatc comparnfons The x «calo in tho^-e charts 



lio 64 Grow ill curie »I t>» n* the rate o( incr(-&.e m population m Chlonlla 
Tulgant cultures sa a fund o» of ti nc (Proti HohcHion /Muenee cf <*« 
Mie of the Inoculum on IheGrouthof CUorefUt I utganaxnFreehlyPrejxiTcd Cult ire 
yirdtum ^merxean J oumal of Botany I of 27 (.January 1140) p 61] 

IS the X xnation in ago from childhood to old age, representing i 
heterogeneous senlc t\ ith respect to raanx x itxl ex ents, such as 
‘^usccplibihtj to certain t>Tx» of disease iccidcnt, etc Differ- 
ence m age constitutes in itself an attribute introducing hek of 
homogeneitj xxherc such a reference is made of it With refer- 
ence to man} t}’pe 9 of diseases, man at x er} tender ages and at 
old ages is a different being from man at middle life or in the 
prime of jouth Such bixanatcs haxc no reference to central 
tendencies — the matter of central tendencies is irrelevant 
WTiat 13 sought Ls a picture of the association betxxcen the txvo 
X anablcs, and the x cr} character of the data ls such that there 
can bo no expectation of a piling up of frequencies about one 



Fio. 55. — Examples of bivatiato comparisons. 



152 


INTRODUCTIOiV 


a\ erage or central tendency. Figure 55 is presented to show a 
number of examples of bivanate charts It is readilj seen that 
when the purpose is understood such charts are \erj useful as 
a method of picturing \ital statistics, but merely because the 
shape of the two last examples resembles the frequenc) poljgon 
it does not follow that these are true frcquenc\ distributions 


Table 9 — Dlatii Rates pta 100 000 Population in the Umted Statl'-, 
1929, FROii Specified Causes, bt Ace* 


Age 

Tubereulosia 
of the lunga 

male n hitea 

Cerebral 

hemori^ge 

'' Broncho- 
t neunioniB 

Puerperal 

aegrtieemia 

0 

7 01 

11 29 

2 06 

182 00 


d- 

2 27 

1 5 81 

0 59 

6 44 1 


10- 

3 13 

1 74 

0 47 

2 85 ! 

0 15 

15- 

1 22 37 

1 11 

0 83 

3 42 : 

9 94 

20- 

56 33 

0 01 

1 50 

4 33 

' 23 01 

25- 

72 23 

0 92 

2 19 

4 66 

1 24 72 

30- 

SO 34 

j 0 79 

4 24 

6 70 1 

1 22 25 

35- 

86 17 

0 60 

9 95 

9 38 

1 18 48 

40- 

' 95 60 

0 IQ\ 

21 47 

13 91 

8 18 

45- 

101 03 ' 

' 0 18 

45 22 

16 47 

' 0 99 

60- 

100 32 

0 28 

83 37 

22 38 


55 

lOo 27 

0 09 

170 99 

29 77 


60- 

! 102 63 

0 17 

2S6 15 

46 03 


65- 

1 114 62 

0 23 

506 14 

77 03 


70 

1 106 77 


814 09 , 

124 62 


75- 

no 39 


1 323 92 ' 

23S 82 


80-\ 



2 015 65 

445 22 


85-) 



2 477 50 

845 00 


90-^ 

1 76 09 


2 365 00 

l,03i> 00 








100-/ 

1 


1 




* Th« rat« in ihia table were calmlstrd from data on total deaths bj sites in the total 
registration ores of the Lnited Stales in ig29secordin<loth« Bureau of the Cons la (1932) 
Thirteenth Annual Report on MortaUt} Staiistim 1929 pp 196-197 198-199 202 
203 206-207 210-211 and population of tie t-nited Statea by age groups as reportnl 
m the Abirael of tht Crnaui (19T0) p IS3 In 1929 the death registration area of conli 
nentaf Cnited States included 9o 7 percent of the total population 
t Rates m itslien based on 1cm than 10 deaths 

The odd shapes that niaj lie assumed b}’’ bivaiiate charts 
are shown by thc'e illustrations ThcA mav be U shaped, thej 
maj be J shaped, they maj be S shaped, and, of course, they 



• STATISTICS— A STUDY OF VARIATION 153 

may be shaped like an ordinary frequency distribution, but when 
they are tliis is a matter of coincidence, mthout significance.^ 
Figure 56 is an illustration of a bivariate chart of data in the 
field of the natural sciences, which is shaped like a frequency 
curve and which even uses the word “frequency” in the title of 
one of its units, though it is not a frequency curve. It is a chart 
of a bivariate comparison — the amplitude in centimeters com- 
pared with frequency in cycles per second. 



27.1 27.3 27.5 27.7 

Frequency, cycles per sec 

Fig. 56. — ^Another bivariate comparison. [Clark, A. L., and L. Kalz, "Reso- 
nance Method for Measuring the Ratio of the Specific Heats of a Gas, C,,/Cr,” Cana- 
dian .Tournal of Research, Vol. IS {February, 1940), p. 30.] 

Figure 57 shows the relationship between inventories and 
shipments of all manufacturing industries in the United States 
and is a bivariate chart. The dotted line on the figure represents 
the average relationship of inventories to shipments based on 
the 2J4-year period from 1939 through the second quarter of 1941 . 
Deviations from this relationship by the quarterly^ items were 
small during the base period, the expansion of inventories being 
generally in proportion to the expansion of shipments. In 
contrast, inventories increased phenomenally in relation to 
shipments during the latter half of 1941 and the first half of 

1 C/. also such a type of frequency distribution as that described by 
Thomas V. Pearce, “An Unusual Frequency Distribution — the Term of 
Abortion,” Biomeirika, Vol. 22 (1930-1931), pp. 250-252. 



154 


I\TRODUCllO\ 


1942 Protcctuc bujing icpiaccd immediate production needs 
as a motue for much of the inventon accumulation during this 
second period and stocks e\pande<l far out of line mth the indi 



Tio 57 — A tlurd ezan pie of a b variate comparison {Source Sureey of Cvrrtnt 
Butinta* \ot Ti (1043) pp 3 9 ] 


cated requirements of pioduction assuming that the shipments 
give an indication of requirements for production 


STATIC VARIATION AND DYNAMIC VARIATION 
In statistical analysis there are two general forms of variation 
The static form of vanation is that occuinng at a gi\on point 
m time or occurring in such a manner that time maj be rationally 
regarded as irreleiant to the vanation IVhere the variations 
that occur are a function of time honeier the variation is 
djnamic and requires different methods of analjsis In the 
mam the methods of anal} sis of static variation center in the 
treatment of the frequencj diatnbutioii nhereas the methods 
of analysis of dynamic lanation call for a different application 





STATISTICS— A STUDY OF VARIATION 


155 


of principles. The same fundamentals, however, are used in 
the apalysis of dynamic variation or time series as those used 
for the analysis of frequency distributions; only the aj^lications 
differ. 

Rational Frequency Distributions, A rational frequency dis- 
tribution is one in which that arrangement of the data is suggested 
by the nature of the matter obser\md. Such a frequency dis- 
tribution is rational also because the variability of a common 
characteristic is chosen as the basis of the particular classification 
and this basis remains comparable among the objects measured. 
Frequently, the same idea is expressed by saying that the data 
are homogeneous; thus a rational frequencj’’ distribution means 
one in which the Amriable is homogeneous. 

Homogeneity may be defined as the condition prerequisite to 
comparability of data with respect to the attribute or factor 
being considered. The negative aspect of thi.s condition is that 
attributes not being considered are judged unimportant for the 
purposes of the study in hand. The positive aspect of this 
condition is that attributes or factom judged important for the 
purpose of the stud}' are taken into consideration. 

For example, if the attribute height of human beings is being 
considered, color of eyes may be judged irrelevant and therefore 
is not considered. But, for a homogeneous study, age, sex, and 
perhaps race are attriljutes that must be considered because they 
are all correlated with the attribute height and cannot therefore 
be judged unimportant in studying height. Unimportant 
attiibutes (those ignored) have zero correlation with the attri- 
bute studied. Attributes correlated with the attribute studied 
must be taken into cousideration in order to obtain homogeneous 
data. In the example of heights, homogeneity Is obtained by 
classification, that is, by taking heights of a particular class in 
Avhich the correlated attributes are constants. Thus, heights of 
mature Caucasian males may be taken as one homogeneous 
group; another homogeneous group Avould be heights of mature 
Caucasian females; another would be sLxteen-year-old Caucasian 
males; etc. 

An important result of homogeneity is that no particular cause 
of bias or cumulated variation is present. On the contrar}q the 
causes of variation consist of manj'^ minute mutually uncorrelated 



15G 


J^TRODVCTIO^ 


(or independent) causes of 'variation that occur according to the 
lai\ of large numbers m other viords, in a random manner ‘ 
Irralio^al Frequency Distnbuhons Bj disregarding the ele- 
ment of time present m a tune senes whose natural arrangement 
IS according to time occurrence the data maj be reclassified and 
arranged m an arraj, or a frequenej distnbution Such a 
rearrangement m ould conceal the natural time sequence onginalij 
present m time-sencs data when in their natural or rational order 
This t j pe of frequenev distribution is irrational as a method of 
suramanzation The multiple forces affecting variabilitj m a 
time «enes are not usuallj operative at random or in a mutually 
independent manner On the contrary the causes of v anation 
may and usually do form a cumulative senes of mutually depend 
ent vanations It is to be noted in passing that Figs 46 to 48 
are not distnbutions of time ‘enes In the data for each of the*?© 
frequenev distnbutioas the attnbute summanzed is for a specified 
time and all the \ anables are for that specified time— thus time 
is held constant and the vanation shown in the histogram is 
uncorrelated vnth time These are rational frequency dis 
tnbution«i But as soon as the data are viewed m their dynamic 
aspect that is to sav are correlated with time the many biasing 
attnbutes or factors of time destroy homogeneity in the data * 
For example with respect to the pnee of sugar taken as a time 
senes the supply at a subsequent penod might tend to be larger 
as a result of the relatively high pnee existing at the earlier 
penod and as a consequence the prevuous high price is a cause 
of a later lower price The e-astence of a price situation, morev er, 
at a giv en time may produce technological changes m the pro 
duction and distribution of sugar that in turn will be a dominant 
factor m the determmation of a subsequent pnee 

In spite of the fact that the procedure of reclassifying time 
^enes and arrangmg the data in frequency distributions is 
irrational, it has legitimate uses in statistical analysis There 
13 a place for irrational procedure m the progressive development 
of knowledge, but when used the user should be conscious of the 
irrationaUty inv oh ed * 

* For careful cons dorationof tlielawoflargeuumbers seeChaps I\-XI 
al-oseeJ C *^011111 and \ J D neon^a pi t g cs arul Appheattom 
pp 101 103 hereafter referred to bv the short t tic 5 r phng ‘^lalul cn 
’ For furti er d % us on of t me- enes anahs s see Chaps \IX W\ 

’ Fi«cker IvDwir The StryetwreofThoight p 360 Cf for illustrations 
of “uch rearrangement of t me «enes Dickson H Lca^ens Frequency 



STATISTICS— A STUDY OF VARIATION 


157 


VARIABLE QUALITIES, OR ATTRIBUTES 

statisticians have to deal -with variable qualities, such as 
different colors," different races, different climatic conditions, 
different geographical locations, or different intellectual or moral 
capacities, their problems are principall3’- questions of the con- 
sistent use of class or group distinctions. Usuallj" there is no 
need ior elaborate ipiantitative treatment. Yet, so' iar as 
possible, statisticiaiLS strive to convert quality, or attribute, 
differences into quantitative terms, and when that is accom- 
plished, their analysis is similar to the analysis of frequency 
series. It has been found, for example, that certain tests can 
be made to provide quantitative measures of differences in 
intelligence, native or acquired; and a large scope for statistical 
anatysis lies in the field of education and psychology through 
the use of these tests. 

Distributions Corresponding to Time Series,” Journal of the American 
Statistical Association, 'V’ol. 26 (1931), pp. 407-415. 



PART II 


Analysis of Frequency Distributions 

CHAPTER Vr 

SUMMARIZATION AND COMPARISON 

For summarization and compin‘«on of static ^arlatlon the 
fundamental tool of analysis is the frequencj tlistribution, its 
graphic presentation, and the analysis of its characteristics The 
frequency distribution portrayed in a table or in a graph gives 
a picture of the whole of the variation relative to some 
particular matter, but how can compansons be made? The 
frequency table, especially if large numbers of magnitudes are 
involv ed, even though it is admittedly better than a haphazard 
arrangement of the data, requires study before the mind can 
grasp Its full significance If two frequency distributions of 
heights {eg , of mature males m New Jcn>ey in 1800 and of 
mature males in New Jersc} in 1900) arc to be compared, the 
frequencj table could be used, but the total number of cases 
measured might be different in each year taken, which would 
make it more difficult to di'^cern the similarity or lack of similarity 
of the tw 0 distnbutions To make comparisons, a chart could 
be drawTi, but a chart may be large or small depending on the 
scale used, and dilTerences would then appear from purely 
arbitrarj , mechanical causes having no real significance More- 
over, if the heights of the^ same males and also their weights 
are to be compared, a comparison of nonhomogeneous units 
(inches of height and pounds of weight) is required Clearly 
some method or methods of summarization, and comparison of 
frequencj distributions must be devised 
Use of Frequency Distributions. The common practice of 
attaining a summarj figure by "averaging” is familiar to all, but 
it should be dear that an average, taken by itself, is indeed a 



SUMMARIZATION AND COMPARISON 


150 


ver}’- “summaiy ” expression for a variable. It is one value, used 
to represent a whole series of variations; and a study of the varia- 
tions about the average maj’’ be as important as or more important 
than the stud 5 ’’ of the average alone. In statistics and in most of 
the fields of studj’^ that use stati.stics and statistical methods, the 
average is generall 3 '' a convenient point of departuj’e for a more 
adequate anal}^sis of the variable. > 

Types of Comparison. There are six possible ways in which it 
may be desirable to obtain summaiy figures and to make com- 
parisons. This may be explained bj'- the use of diagi’ams, as 
follows: 

In Fig. 58, the central tendenej'^, or average, is located at A, 
which is plumb with the peak 
of the frequencj' curve. In this 
figure the central tendency is 
typical, in the sense that it is a 
magnitude that occurs more 
frequentlj'' than other magni- 
tudes. It ma}" be looked upon 
quite rationally as a norm, or 
t 5 "pical value. In such a case, 
the a-\’'erage A*alue has a signifi- 
cance for itself, as a summaiy 
value, but its principal use is still a comparative one. For 
example, suppose that in Fig. 58 the quantit^v variations (the 
X scale) are heights of children of a specified age while the curve 
represents the number of children haidng the indicated heights. 
The question is asked whether or not a certain child is normal in 
height. If the child has less height than height A , how much less 
must he be so that lack of development in this respect indicates 
need for medical advice? jtt once it is suggested that it is unpor- 
tant to determine how much on the average children varj’^ from 
this normal height. Accordingh^, the principal use of the average 
as a siunmari" figure, when used as a norm, is to compare indi- 
\ddual variations with the average and to compare individual 
variations with the average amount of variation to be expected. 

The second t 3 'pe of comparison is the difference in central tend- 
encies existing between two distributions a and 6, as illustrated 

^ Fisher, R. A., Statistical Methods for Research Workers (1941), Section 1. 
References to section numbers are valid for ani’ edition of this book. 



I'll}. 58. — A central teiiilenoy as a 
norm. 



160 


ANALiSIS OF tREQVFIfCi DI&7 IlIBU FIONS 


in Fig 59 This difference is measured by eompanng the aver 
ages of the two distributions, for example, by the comparison of 
the average height of children m third grade with the average 
height of children in sixth grade Such a comparison is rational 
only where the units of the two frequency distributions are 
comparable 



Ouanfity variatons Quorritty voriaftons 

1 10 50 — Two djflerent ccatrnl lend tio 60 -—Similar central tenden- 
encies cies but different variability about 

the central tendencies 

The third type of companson is illustrated m Fig 00, in 
which some sort of measurement of the variability of the vana- 
tions about the average is required for making the compari- 
son, for example, an average of the variations from the central 
tendency could be used Such measures are called "measures of 
vanability ” 



lio 61 —Similar central tendencies butdifferent l>l>esof skeuiicaa of distribution 
about central tendencies 

Figure 61 illustrates the fourth type of comparison Fre- 
quency curves o and b have peaks plumb wlh the same quantity, 
A, but o IS skewed to the left and 6 is skewed to the right The 
central tendency A is a value of greatest freqiiencj in both 
curves, but the lowei range of cune a is farther fiom A than is 
the lower range of b, and the upper range of a is much nearer 



. SVMMABIZATION AND COMPARISON 


161 


to A than, is the upper range of h. School gi-ades sometimes 
have a frequency distribution like a, with the most common grade 
around 70 or 80 and Avith very few above 90, yet \vith some grades 
below 20 or even as low as 10. Personal incomes are-distributed 
like cuiwe 6, v.dth the most common income at an amount near 
to the lowest and a few incomes 
of amounts far above the most 
common amount. When fre- 
quency curves like a or h in Fig. § 

01 are encountered, it is desir- S' 
able to have some way to meas- 
ure skeumess and evaluate Us 

importance m coimection with 02.-Different central tendencies 

the interpretation of other statis- and different variabilities, 

tics about the frequency curve. 

In the fifth 15^0 of comparison, illustrated by Fig. 62, not only 
may it be desirable to compare average udth average and variabil- 
it}’- with variability in aggregate terms, but it may be e-ssential also 
to find a way to compare relative variability . The variability in 
b relative to its average maj’’ not be so much larger than the 
relative variability in a as the graph seems to show. The graph 

shows that the absolute varia- 




A 

Quantity variations 


Fig. 03 .— Similar central tend- 
encies. similar variabilities, and 
absence of skewness, but diffeient 
concentrations at center and along 
tails. 


bility in b is greater; but it may 
be that the relative comparison 
is the more significant one. To 
make the relative comparison 
requires the calculation of further 
information. 

Curves a and h in Fig. 63, which 
illustrates the sixth type of com- 
parison, have the same central 
tendencies and approximately the 
.same average deviations about 
their central tendencies; but b has 
a relatively greater concentration 


of small variations close to the central tendency and also rela- 
tively more extreme variations than does a. Another way of 
looking at this difference is to note that the shoulders of a are 
broader than the shoulders of b and that the top of a is flatter 
than the top of b. The relative flatness of top or breadth of 



102 l\ AL\£>lb Ot FREQUEACl DISTRIBUTIOSS 

shoulder of a distribution is called “kurtosis ’’ The measure- 
ment of this characteristic is impoitant m determining the relative 
importance of small \'iriations from average in the two cu^^e': 

It appears to follow from the abo\e discussion of six types 
of compan«on that the analysis of frequenej distributions require* 
the calculation of the a^e^age and m addition, the calculation 
of measures of dispersion * 

THEORETICAL SIGNIFICAHCE OF FREQUEMCY CORVES 
Histograms and Frequency Curves It has been noted that a 
frequenej distribution raaj be graphed in the form of a histogram, 
that IS, a figure m which the frequenej of anj class mterval is 
represented a rectangle erected on that mterval as a base and 
with a height equal to the observed frequenej * If the data are 
contmuous m character, that is, if they change bj %erj small 
jumps it maj become reasonable to represent the frequenej 
distribution bj a smooth frequenej cun o rather than bj abroken 
histogram 

Irca Ilntograms It is po^iblc to make certain modifications 
in the form of the ordinarj histogram to represent the frequency 
of cases occurring m anj class inlcnal, not bj the height of the 
rectangle, but bj the area of the rectangle If tlie class inten al 
u equal to unitj , an area histogram is identical with one m which 
frequencies are represented bj heights, since the altitude multi 
plied bj the base equals the area But if the class intcia al is 
greater than umtj , the height of an area histogram wall be pro 
portionatelj reduced, if the class mteiaal is le'*s than unit\, the 
height wall be proportionatelj increased This follows because in 
an area histogram the frequenej of anj class intenal is giien b\ 
the height of the rectangle erected on it, multiplied by the length 
of its base (that is, bj the size of the class inten al) In histo- 
grams of the area tj-pe, it follows that the total area of the 
histogram alwai-s equals the total number of cases, A 

RelaUic Frequenexta Tlic histogram maj be further modified 
bi making it represent relatue or proportional frequencies 
rather than absolute frequencies Followang is a table showing 
a proportional frequenej distnbution * 

‘Seepp 16S-19a 

’ For illustrations «ee Figs &4andC6 pp 187-188 
* Cf p 141 



SUMM ARIZATION AND COMPARISON 


163 


Maternal Mortality in Cities op 100,000 or More Population in the 
United States, 1938 ' 



1 

Kelativc number of cities 

Death rates 
(number per 1,000 
live births) 

1 



Number of 
cjtips 

Expressed as 
proportions 

Expressed as 
percentages 

A' 

. 

N 

X 

o 

o 

(1) 

(2) ^ 

(3) 

M) 

1- 

2 

0.022 

2.2 

2- 

16 

0.172 

17.2 

3- 

18 

0.193 

. 19.3 

4- 

20 

0.215 

21.5 

5- 

15 

O.lCl 

16.1 

6- 

10 

0,108 

10.8 

7- 


0.043 

4.3 

8— 

' 6 

0.064 

6.4 

9- 

0 


0.0 . 

10- 

2 

■BEBI 

2.2 


93 

1.000 

1 100.0 


In the above table the figures in column (3) represent the pro- 
portionate frequencies, namelj'’, the proportionate number of 
cities having the specified maternal mortalitj' rates. Since this 
illustration has a class interval of 1, an area histogram could be 
obtained by iilotting the frequencies of column (2) in the form of 
vertical bars, ndth the heights equal to the respective frequencies. 
A proportional area histogram could be obtained bj'^ similarly 
plotting the frequencies shomi in column (3); because in the 
resulting histogram, the area of each rectangle would represent 
the proportion of the total number of cases falling in a class 
interval; it Avould represent F/N instead of F. The total area 
of such a histogram •rtill alwa 3 "s equal unitj’-, just as the total of 
column (3) equals unity. This will be true no matter what the 
form or shape of the histogram, because SF = N. 

Frequency Curves.’ Suppose that the data, from which the 
histogram has been constructed, are a sample from a very large 
set of cases, theoretically an infinite set. For instance, the 
data, might be the heights of 100 adult males of the white race, 
instead of the mortality statistics above illustrated. The 100 













104 ANALYSIS OF FREQUENCY DISTRIBUTIONS 

heights then ^\ould be a sample of the heights of all adult men 
of that race presumably millions of men In such a relate ely 
small sample the size of the class interval cannot be reduced \nth 
out causing the histogram to show \er> irregular fluctuations 
If how ever manj cases are added to the number m the sample 
say heights of 200 men the size of the class intcrv al could be 
reduced for example from 10 units to 6 units without causing 
the occurrence of such irregularities In fact if the number in tl e 
sample is made larger and lai^r and at the same time the size of 
the class mteiwal is continuously reduced the histogram will 
tend to become more and more r^ular and the tops of the rec 
tangles which arc getting narrower and narrower will come 
closer and closer to forming a smooth continuous curve (a fre 
quenev curve) In such a manner the frequenej curve may be 
vnowed as the limit that an area histogram of relative frequenciPs 
approaches as the number of cases is increased and the size of the 
class interval is reduced indefinitely The frequenej cuiw c is the 
distribution of a theoicticallj mfimte set of data with a theoreti 
callj infinitesimal class inter\al 

Being the limit approached by an area histogram of lelativc 
frequencies the frequency cune has a total area (between the 
curve and the z axis) that is alwajs equal to unity Further 
more any section of area under the curve' will give the relative 
frequency of the cases falling within the class interval marking 
off that section of area It is upon this basis that tables of 
relate e frequencies are constructed for certain well known 
frequenej cun es * 

U ses of F requency Cunes Frequency curv es are h> pothctical 
but thej are idealizations of frequenej distnbutions that arc real 
Thej serve manj useful purposes and m the theory of statistics 
thej are indispensable One important use of frequency curves 
IS the giaduation of frequeni^ distributions obtained from actual 
observ ation Suppose for example that a frequency distnbu 
tion has been constructed using a class interval of 10 units 
Suppose further that the number of casra is such that anj smaller 
class interval would introduce marked irregularities into the 
distnbution irregularities that it is believed would not be present 

• See Fig 91 p 2W and Fig 94 p 277 

* See Appendix Table VI 



SUMM/iRlZATlON AND COMPARISON 


165 


if an infinitel}'^ large number of cases were observed. In this 
case a frequency cuiwe fitted to the distribution (histogram) 
may be the best means of estimating the true frequency for any 
given class inteiwal. In other words, the frequency curve affords 
a graduation for the frequenc}’’ distribution. The frequency 
cuiwe makes it possible to interpolate values not given directl}^ 
by the original sample frequency distribution. 

Besides sexwing to graduate a given set of data, frequency 
curves facilitate in other ways the description and comparison of 
frequencj’- distributions. For instance, the peakedness or flat- 
ness of a pai’ticular frequency cuiwe, called the “normal fre- 
quency' cuiwe,” is taken as the standard to which the peakedness 
or flatness of a given distribution is generally referred. Again, 
theoretical analysis shows that data affected by certain kinds of 
forces will tend to be distributed in the form of particular types 
of frequency cun'cs. Certain types of cuiwes, therefoi'e, become 
the expected noim for all data affected by particular kinds of 
forces. As a con.sequence, the hypothesis that variations in a 
given set of data have resulted from certain foi’ces may be tested 
by noting how well the distribution of the data conforms to the 
type of frequency curve that the.se forces may be expected to 
produce. In such instances frequency cuiwes help to explain the 
underlying causes of variation. Such an analysis is of special 
importance when it is assumed, as it is in so many statistical 
procedures, that chance is the fundamental cause of variation. 
It is to be noted that a difference in the general form of two 
frequency distributions may in some cases be looked upon as of 
more fundamental importance than a mere difference in their 
averages, dispei’sion, and the like, because such a difference in 
form may indicate a contrast in the type of forces causing varia- 
tion in the data. To detect a fundamental difference of this 
kind frequency curx'es are used. 

Still another useful purpose sei'ved by frequency curves is in 
sampling analy.sis. Since a chapter is subsequently devoted to a 
discussion of sampling, it need merely be touched upon and 
simply illustrated in veiy general terms at this point. ^ For 

» See Chap. XII; .see also Smith, J. G., and A. J. Duncan, Sampling 
Stalislics, pp. 107-109, Parts II and III. 


160 A\AL\S[6 OF FREQLEACi DISTRIBUT10\S 

illustration, suppose that a large number of balls, each with a 
number wTitten on it, are placed in a big bowl and tliorougblj 
mued Suppose that 10 balls are drawn at random from the 
bowl and their numbers read off and averaged Suppose that 
this sampling operation is repeated over and over again, the 
balls being replaced and thoroughl 3 mixed after each set of 
drawings Experience shows that unless the distribution of num 
bers IS freakish the distnbution of sample a\ eragcs w ill approxi- 
mate the so-called "normal” frequency curve If, instead of 
the average of the respective 10 readings a certain measure of 
the variation around their averages, known as the "variance,’ 
had been recorded in each instance, then the frequency distnbu- 
tion of these measurements of vanation would have tended to 
conform to a frequencj curve known as the "x* curve The 
significant thing is that "sampling distributions” of this kind tend 
to conform to specific fiequency curves that may be described 
by defimte mathematical formulas In general, these formulas 
ire expressed in terms of the charactenstics of the "population” 
(in the illustration, the bowl of numbers) from which the sam 
pies are drawn The consequence is that if a random sample 
has been obtained from an unknown population, it is possible 
from knowledge of the sampling distributions of various sample 
measurements to make certain inferences regarding the nature 
of the population from which the sample has been drawn This 
is probably the most important use that is made of frequency 
curves in statistical analysis 

MEASUREMENTS OF SUMMARIZATION AND COMPARISON 

Population, Parameters, and Statistics Population To say 
that the population of the United States la one hundred and 
thirtv million people is a familiar use of the word "population ” 
In statistics the w ord is used m the same familiar sen^e, but it is 
also used in a more general sense, referring to the count of 
persons or of animals of any kind or even to the count of inani- 
mate things To statisticians the term means all the things, 
animate or inanunate as the case may bo, of a given kind m 
the known umvei'se or in a specified universe, for example, all the 
people on the earth, or all the people m the United States if the 

* Th«is read ‘chi square”, thelettern the Greek small chi 



HU M M AKtXATIOK AND COMPARISON 1(37 \ 

uniA'erse is more specific. An example of an inanimate popula- 
tion would be all the petroleum in the kno'wn universe or, if a 
more specific universe is considered, all the petroleum in the 
United States. 

Parameters and Statistics. In the theory of statistics the 
measurements of the characteristics of the population are called 
“parameters.” The average height of all people living in the 
United States is a parameter of the population. No one has 
ever actually measured the heights of all the people living in the 
United States, and it is not likety that anyone ever will do so. 
Nevertheless, this population does exist. In practice, it is 
much easier to estimate the average height of all the people by 
taking the average of a sample of the people. This latter 
average, the average of the sample, is called a “statistic.” 
Accordingly, parameters are measures of the characteristics of the 
population, and the corresponding sample measures are statistics 
commonlj'’ used to estimate these parameters. A statistic is thus 
a valuja computed from an observed sample in order to char- 
acterize the population from which it is drarvn. Parameters are 
the characters of the population.* 

In accordance with this terminologj'", the quantities to be 
obtained as measures of central tendencies are “statistics,” the 
arithmetic mean is a “statistic,” the range (difference between 
the highest and lowest magnitude) of a frequenc}’’ distribution is 
a “statistic.” 

Averages. There are several kinds of averages. The one 
most familiar is the arithmetic mean. The others most gen- 
erall}'’ presented are the median, mode, geometric mean, and 
harmonic mean. The most commonly used averages are the 
mean, the median, and the mode. In this chapter each will be 
viewed in its simplest aspect, and at the same time the s 3 mibolic 
language associated with the analysis of frequencj'’ distributions 
will be introduced. 

The Arithmetic Mean. Py definition, the arithmetic mean 
is the sum of the cases divided by the number of cases. For exam- 
ple, taking a simple case of ungrouped data, i.e., where the 
frequencies are 1 throughout (each X occurs once), Fi, Fz, , 

F^ each = 1 : 


* Fi.sher, op. cit., pp. 7-8, 41. 



108 


n^l/lS/S Of fRFQUfSCi l»l‘illtlBU110\S 


\ / 

Ai 2 /, 1 

A, 3 F, 1 

\, 4 /, 1 

A* c r« 1 

8 /, 1 

9 F. 1 

\ 10 ft 1 

lA = 42 2F - 7 

The Sinn of the % anahle magnitude'! m this case is 42 The 
number of \ariable magnitudes i-» 7 Hence by definition the 
arithmetic mean is V or C 
Sjmbolicallj 

42 - 2X te \,+ \.+ + \t 

7 ^ ZF -N te /i+f‘t+ +/7 

The arithmetic mean is reprc'^cnted bj the sjmhol A, and 
hence 



In frequenej distributions the F is not equal to 1 throughout 
but vanes An illustration of the calculation of the arithmetic 
mean of a frequenev distribution ls shown below 


\ I fX 

2 3 r 

3 3 9 

4 C 24 

o 9 4o 

6 6 3C 

7 3 21 


2For \ =30 iF\ = 141 

It should be noted that the sum of the A’s cannot be obtained 
b} adding the first column because the \anoiis X s occur 3, 0 or 
9 times Consequently, the sum of the A s is obtained b} mul 
tiphmg each A bj its respectne frequenej and then adding 
the products 

XFX « 141 
rr or AT = 30 



SUMMARIZATION AND COMPARISON 


169 


and therefore, by definition, 

Z = = 4.7 

If the frequencies of a frequency distribution are expressed in 
relative numbers, i.e., if each frequencj'^ is expressed relative to 
the total number of cases, Fi/N, F^/N, . . . , F„/N, the arith- 
metic mean is merely the sum of the third colunm, as follows: 



F 

F 

X 


Xx 


N 

N 

2 

0.1 

0.2 

3 

0.1 

0.3 

4 

0.2 

0.8 

5 

0.3 

1.5 

6 

0.2 

1.2 

7 

0.1 

0.7 


1.0 

4.7 


Following the definition, the arithmetic mean is the sum of the 
third column divided by the sum of the second column; but the 
sum of the second column is 1, by definition. Consequently, 
the arithmetic mean is the sum of the third column. 

(la) 

This modified form of the definition of the arithmetic mean is very 
convenient in certain statistical problems. 

The sum of the deviations of the cases from the arithmetic 
mean is equal to zero. This may be demonstrated as follows : 
.Given the variable Xi, Xo, . . . X„, X = "LFX/N. 

Fi(Xi - X) = Fixi 
■ Fn{X2 - X) = F 2 X 0 

F„(X„ - X) = F^Xn 

XFX - XX = SFx by adding 

The small x is used regularly to refer to the deviations of the 
Variable from the arithmetic mean. 

^^^len added, SFX becomes NX because X is constant and 

SF is equal to N. 



170 


l\ALlSIi> at FRtQUBSCl PISTRIBVTIOSS 


If the value of \ given m Eq (1) is substituted in 
ZFX -NX = SFx 

it becomes 

SFX - SFi 

By canceling the N, this becomes 

IFX - 2FA = iFx 

and hence 

ZFz = 0 (2) 

The Median and ike QiiartiZes In its original simplicity and 
by definition the median is not a mathematical concept like the 
arithmetic mean On the contrarj the median is a position 
average Bj definition the median ts that lalue than which there 
t$ an equal number of cases larger and smaller When the case? 
are arranged m an arraj , the median is either the value of the 
middle one (when there is an odd number) or some value between 
the two middle ones (when there is an even number) Nor- 
mallj, m the latter instance, the arithmetic mean of the two 
middle cases is taken as the median value To illustrate from a 
V erj simple example with an odd number of cases 

A 

1 

2 

3 

6 

7 

8 
9 

X4 or 0 15 the median bv defimtion because it is the middle 
one in the arrav AIi = 6 (Mi is the conventional symbol for 
median) 

It IS to be noted that A = 5 143 

In this illustration it is seen that 1, the first case, is 5 smaller 
than the median, while 9, the last case, is onlj 3 larger than the 
median This preponderance of smallness of the variable 
lesults in an arithmetic mean smaller than the median By 
definition the anthmetic mean is affected bv everv variation and 
consequent!} bv evtreme variations It is affected bv the size 



SUMMARIZATION AND COMPARISON 


171 


B/iid tliG nmnbci of cjisgs sbovc sud below it» Tbe niedisn, 
on. the other hand, bj’’ definition, is not affected by the size of 
the cases above or below it. 


"V^Tien the frequencies '^'aiy, 
the follovmg illustration: 

the median may be found as in 

a: 

F 

Cumulated F 

1 

2 

2 

2 

.J 

G 

3 

5 

11 

4 

8 

19 

5 

7 

26 

6 

3 

29 

7 

2 

31 

31 

Thus, there are 2 cases where X = 

1 ; there are 4 cases where 


X = 2; etc. In all, there are 31 ca.ses (S^’ = 31 = AO, and the 
middle one is the sixteenth. Ali = 4. That the median is 4 is 
quickly seen by the examination of the cumulated frequencies in 
the third column. This is equivalent to taking the median equal 
to the {N + l)/2th ease, a procedure often adopted when dealing 
with ungrouped data.^ 

In general terms, the first quartile Qi is that value below which 
one-fourth of the cases fall and above which three-fourths of the 
cases fall. Similarly,' the third quartile Qz is that value below 
\\'hich three-quarters of the cases fall and above which one-fourth 
of the cases fall. The median is the second quartile, or Qz. In 
the above frequency distribution, N/A = 7-| and 3A/'/4 = 231. 
Qi should thus be some value below which 7f cases fall and above 
which 231 cases fall. For this distribution, it happens that the 
seventh and eighth cases are identical, and it therefore follows 
that the value of Qi is the common value of the seventh and eighth 
cases, or Qi = 3. If the seventh and eighth cases had not been 
the same, then Qi could be taken as a value between the values 
of the seventh and the eighth case to be found bj’- interpolation. 

For ungrouped data, it is recommended that a unifoim and 
systematic method of interpolation be adopted, as follows:- 

1 Wlien the data are grouped, it is simplest to find the median by interpola- 
tion within the class interv^al for the N /2th case. This method is described 
and illustrated in the next chapter. See pp. 218-220. 

- For the method of interpolation when the data are grouped, see pp. 
218-220. 



172 


\LYSIS OF FIikQVE\Ci HISTJUIWl lOS'i 


Take Oi as the (AT/ 1 4* 4) case, Qt, or Mi, is the (A72 + 4) 
case and Qt as the [dK/i + 4) case Consider, for example, the 
eases 5, 9 II, 16, 25, 31, 38, 43, 45, and 49 The {AV4 4* 4) case 
noiild be the 4- 4) or third case, j e , 0i ** H The median 
would be the (•‘r 4- 4) or 54lh case Since there is no 5ith case, 
how ei er, hut only a fifth case, 25, and a sixth case 31, thcmefliiin 
is taken as the value that lies just hnlfn ay betw een the fifth and 

sixth cases te Mi = 25 4- =* 28 The third quarlile 

would be the (V 4* 4) or eighth case, t e , Qj » 43 

As another illustration, suppose 51 is added to the set of 
numbers, making a total of ele\ cn instead of ten numbers 1 hen 
the first quartile would be the (-V" 4- 4) or the 34th case But 
there is no 3ith case, only a third case, II, and a fourth case, Ui 
Hence <3i is taken as the > alue that lies one-fourth of the distance 
between the third and fourth eases The difference between the 
third and fourth cases is 16 — 11 =* 5 and Qj is therefore taken 
as 11 4- 4 *“ 124 Similarly, Mi is the (V* 4* 4) or sixth case, 
which IS 31 

The Mode As in the case of the median, tho mode is not a 
mathematical concept Moreover, it w not a 'position a\ crago ” 
The mode is an average that ls described m teims of relative 
frequenev of occurrence It is defined as the magnitude that 
occurs more frequentlj than anj other The mode is the most 
probable magnitude and might l>e considered a “probabihtj 
average,” lK?cau‘«e it is often thought of m terms of probabilities 
It mav lie illii-stratcil as follows 

\ F 

2 I 

i 2 

4 4 

C 7 

7 S 

8 > 

9 4 

10 3 

51 

Hj definition tho mode (AIo) w 7, because this value occurs 
more frequently tiian anj of the others Tho probabilitj of the 
mode i> and is greater than the prob ibilitv of anj other v alue 
of \ 



SUMMABIZATION ANB COMPARISON 173 

It is to be noted that the X of this example is 6.706 and the 
median is 7. The mode is not affected by the size of the cases 
above or below it, nor is it affected b 5 " the number of cases 
above or beloAV it, within certain rational limits. For example, 
in the illustration, two magnitudes (X = 8) could be added 
to the distribution and the mode would remain 7, as before; 
but if five magnitudes (X = 8) were added to the distribution, 
then 8 would become the mode, as its frequency would then 
be 10. 

It has been established that, when the distribution is only 
moderately skewed, the mode can be estimated from the mean 
and median b 3 ’’ the following approximate formula: 

Mo = X - 3(X - Mi) (3) 

Accordinglj', the mode maj' be estimated if the mean and 
the median have been calculated. In actual problems involving 
grouped data the mean and the median are both more accurately 
determinable than is the mode, and for this reason the above 
formula often gives more satisfactoiy results than anj’^ convenient 
direct procedure. This is called the “mathematical mode.” 
It should be emphasized that the formula should not be used, 
however, if the distribution is very skewed. 

The Geometric Mean. The geometric mean (G.M.) is a 
mathematical concept and is defined as the nth root of the prodvct 
of n variables X. Accordinglj'-, the geometric mean of 5, 8, 
and 25 is (6 X 8 X 25)'* = 10. The geometric mean may 
also be defined as the antilogarithm of the arithmetic mean of 
^ .the logarithms of the variable X, i.e., 

log G.M. = (4) 

This maj’^ be illustrated as follows: 

0.69897 
0.90309 
1.39794 

3 )3.00000 

1.00000 

The antilogarithm of 1.00000 is 10; hence, G.lvl. = 10. 


log 5 = 
log 8 = 
log 25 = 
2 log X _ 
N 



17t 


\XALiSIS OF FRFQVF^CY DISTRIl3UTI0\S 


lust as the arithmetic mean balances the aggregate de\ntions 
so the geometric mean balances (he ratios of the sanation*', 


A'. „ 

G M G M 


( 5 ) 


Foi, bj taking logarithms, this expression becomes 

log A'l + log A’* + log A’i + + A'n — log G M =0 

or S log A' — iV log G M = 0 

But from Lq (4), 

log G M = S log A 

and hence the c\picssion is shown to be true 

In some tjpes of problems the geometric mean gi%cs moii» 
hatisfftctorj results than the arithmetic mean For example, it 
IS ncct“»sarj to use the geometnc mean to average percentage 
increases of population ovei successive years or decades or to 
average percentage changes m income, production, and the like • 
Ihus, in the column marked X of the following table the esti- 
mated national income of each jeans expressed as a percentage 
of the preceding year 


Xtar 

Galtiniled ntu«nal 

1 income 1 roJuced in 1 
ti e OoHed Sia)«« 

1 l>il}>0»* of dollar* 1 

.1 

Inete**« in nttional 

1 income eiprtntd ta 
percentace of pre- 

1933 

17 3 ^ 


1934 

54 C 

115 43 

1935 

o9 2 

1 lOS 42 

1936 j 

68 9 

' 116 39 

1937 ‘ 

73 1 

106 10 

1938 { 

67 0 

91 60 

1939 1 

69 7 

' 104 03 


> Surtn/o/Curmil Punnett Vol 20 (AUrcb 1940) i> 19 (April 1940) p 11 

If tht average annual percentage increase is obtained b\ 
calculating the arithmetic average, the answer obtained is 
042 03/0 = 107 01, which represents an a\ erage annual percent- 


»Ch»ddocx, R 1^, Prina/ilfsand J/fiAadte/S/a/itOff (102oX PP 12fr- 
127 Ckoxtov, y F,ai)dD J Cowsfn, <S(a{t«ncx (1939) 

pp 225-226 



SUMMARIZATION AND COMPARISON 


175 


age increase of 7.01 per cent. Now, if 7.01 is used as a constant 
annual percentage increase from 1933, the following figures would 
be obtained: 

Constant 7.01 Percentage 


Year 

Increase from 47.3 in 

1933 


1934 

107 01 

per 

cent 

of 

47 3 

= 50 

62 

1935 

107.01 

per 

cent 

of 

50 62 

= 54 

17 

1936 

107 01 

per 

cent 

of 

54 17 

= 57 

97 

1937 

107 01 

per 

cent 

of 

57 97 

= 62 

14 

1938 

107.01 

per 

cent 

of 

62 14 

= 66 

50 

1939 

107.01 

per 

cent 

of 

66.50 

= 71 

16 


But in 1939 the actual figure, as shotvTi in the preceding table, 
was 69.7; and the average percentage j’^earlj’- increase could 
not have been so large as 7.01. To obtain the correct per- 
centage increase, the geometric and not the arithmetic mean 
should be calculated in this instance. Following the formula 
given above for the geometric mean, it is calculated for this 
problem as follows: 


1 

Year 

Estimated national 
income produced tix 
the United States, 
billions of dollars 

Logarithm of the 
percentage increase 
log .Y 

'1933 

47.3 


1934 

54 6 

2.06232 

1935 

59.2 

2 03511 

1936 

68.9 

2.06591 

- 1937 

73.1 

2.02572 

1938 

67.0 

1.96218 

1939 

69 7 

1 

2.01716 


If the average annual percentage increase is now obtained 
by calculating the geometric mean of the rates of increase, 
by first taking the arithmetic mean of the logarithms 

= 2,02807 
o 

and then taking the antilogarithm (antilogarithm of 2.02807) 
the answer obtained is 106.68 or an average annual percentage 
increase of 6.68 per cent. If 6.68 is assumed to be the average 
annual percentage increase since 1933, the following figures would 
be obtained; 



170 


ANilYSlSOl- FRFQUElfCi DISTRIDUTIOSS 


Constant 6 68 I erccntHEt 

\e)r Increase from 47 3 in 1933 

1934 106 68 per cent of 47 3 = 50 46 

1935 106 68 per cent of 50 46 « 53 83 

1936 106 68 per cent of 53 83 = 57 43 

1937 106 68 per cent of 57 43 » 61 27 

1938 106 68 per cent of 61 27 •= 65 36 

1939 106 63 per cent of 65 36 « 69 73 

In 1939 the actual figure as shown above was 09 7, to which 
69 73 IS a close approximation, and hence the average annual 
percentage increase apparently nas in fact close to 6 68 

Tic Harmonic Mean The harmonic mean (H M ) is the 
rrnprocal of tie aicrage of tie reciprocals of obsenations of lie 
tariahle \ thus 



Accordmglv the haimonic mean of 5 8 and 25 i\ould be found 
as follows 

From a table of rccipiocals or bj calculation the reciprocals 
of 5 8 and 25 aio determined — 0 200 0 125 and 0 0-40— and 
hence the harmonic mean bv definition is 

3 - 1-=822 

0 200 + 0 125 +0 040 0 305 

The geometric mean of the>e three numliers as discovered abo\ e 
IS 10, the anthmetic mean is 5 + 8 + 25 = 38 divided b) 3 
or 12 67 It IS thus seen that tlie arithmetic mean is the 
largest the geometne mean next and the harmonic mean the 
smallest of these three averages It is always true that' 

HM <GM <X (7) 

The usefulness of the harmomc mean atLses in connection vith 
certain types of problems m which variable quantities of one 
variable are compared with a constant quantity of another 
For illustration, speeds ma> be looked upon as variable numbers 
of miles per minute (a constant quantity of time) or as vanable 
amounts of time required to cover a given distance Similarlj 
1 For a proof see G R Davipa and W F Crowder Methods of Statistical 
Analysts tn the Social ‘Science* p 313 



SUMM ARIZATION AND COMPARISON 


177 


prices may be looked upon variable amounts of money per 
unit of goods sold (a constant quantity of goods) or as variable 
amounts of goods that can be purchased vith §1. In many 
such problems the choice of the variable -for which the quantity 
is ahvaj^s constant is optional, depending upon the type of 
information it is desired to emphasize. There is a nice dis- 
tinction between the mean and the harmonic mean wherever 
such interchangeability is present. This will be illustrated 
bj’’ examples. 

The accompan 3 dng table shows data on prices of corn. 

Wholesale Price of Xo. 3 Yellow Cork 
Year Dollars per Bushel 
1913 0.61 

1919 1.59 

1929 0.93 

1939 0.50 

Source: Survey of Current Bueinese, April, 1940, p. 18. 

In the table the amount of monej" varies, but the quantity 
of com is constant. The average price per bushel may be cal- 

3 63 

culated directly from this table, thus: = 0.9075. In order 

to obtain the harmonic mean, the reciprocals of these prices must 
first be calculated. 


Wholesale Price of Xo. 3 Yellow Corx 


Year 

Bushels per Dollar 

1913 

1.64 

1919 

0.63 

1929 

1.08 

1939 

2.00 


The average of these reciprocals must be computed, thus : 


5.35 

4 


1.3375. 


The reciprocal of the latter number must be obtained, namely, 
0.74766. This last number is the harmonic mean of the prices 
expressed in dollars per bushel. The harmonic mean, therefore, 
of the prices per bushel of No. 3 3 'ellow corn is approximatety 
75 cents; and it is the price per bushel of the average amount of 
No. 3 j^ellow corn that could have been purchased for SI, or, 
in other words, it is the reciprocal of the average amount of No. 3 



178 AS^iLiSISOh FREQUh\CY DlblHlBUTlOSS 

^c]low com that could lia\e been, purcha.vcd for $1 If the 
reciprocal of the mean price, 0 9075, is taken, a figure of quiU 
different significance IS obtained, namch , 1 102 This recipronl 
IS the average number of biL-thcls that could have been bought at 
the mean pnee 

In deciding whether to u«c the arithmetic or the harmonic 
mean in am gi\cn problem, it should be determined which 
magnitude should be regarded as the constant (for c\amplc the 
amount of corn bought or the amount of monc> spent), a matter 
that can usuallj be decided wathoiit difficultj in a, practical 
problem If the data arc reeordcrl with the appropriate quantitv 
constant, the arithmetic mean ma> be used If the appropiiato 
item is made a \anable as the data me tabulated, the harmonic 
mean must bo used 

Vnother illustration will serve to clarify further the use of 
the harmonic mean Ihc cfficicncj of n fighting airplane maa 
be determined m part at least, h\ its speed, which can be 
(xpressed cither as the number of miles flown per minute or the 
amount of time rctiuircd to fl> a mile Following aro the results 
of tests of a plane under trial 

llEstLTS or Tests 

Miles per minulc 6 4 7 6 5 

Is the significant measure the rate at which a piano flics or 
the amount of time required to flj a number of miles’ If it is 
admitted that the rate at which the plane flics is the important 
consideration (that is, the number of minutes required to fly a 
mile), the reciprocal of the harmonic mean is the relei ant meas- 
ure, inasmuch as the recorded data make the time element con 
stant and not the distance flown The anthmetic mean, if 
calculated, w ould not be lacking in significance, but its reciprocal 
should not be compared with rate measures m which the numlior 
of miles is constant and the time a arit'« The aa erage number of 
miies per rsiatite V — ^ ^ The reetproeji} ef Knatber, 
0 17857, IS not the harmonic mean, and it is not the a\ erage time 
that it takes to trascl a mile On the contrarj, 0 17857 minute 
IS the amount of tune it requires to go a mile w hen tra\ eling at the 
average number of miles per mmutc The average amount of 
time required to trav el a mile is adiffcrcnt thing, namelj , theav cr- 
age of f , i, and i minutes, or 0 179 minute 11 hilo the two 



SUMMARIZATION AND COMPARISON 


179 


results are close together in value, it avouIcI make a large difference 
in calculations having to do A\ith hours of time if the arithmetic 
mean were used Avhen the harmonic mean ought to have been 
used. 

The Concept of an Average as a Summary Figure. The general 
significance of an average as a summaiy figure maj- be illustrated. 
Suppose that information concerning the heights of mature males 
in ISlewark, N.J., is desired. Heights of all the mature males 
in NeAvark are therefore measured to the nearest -g- inch. The 
data collected aa’III constitute complete information about the 
heights of mature males in NeAA-ark. But this knoAvdedge, in 
untabulated form AAuthout summaiy figures, is not easj"^ to com- 
prehend. It is necessary to analyze this total knoAAdedge in 
some manner so that it ma5’- become more significant. The 
manner in AA'hich the analysis v'ill proceed depends upon the 
object in mind; for example, an ansAA^er to an}^ one of the folloAving 
questions ma3>- ghm a more significant vicAv: 

1. What height AAnll coincide Avith the greatest number of 
recorded observations? The ansAA^er to this question is, of 
course, the mode. 

2. TiTiat is the height such that greater and smaller heights 
have been recorded Avith equal frequencj’’? The ansAver to this 
question is the median. 

3. What is the height such that the sum of the squares of the 
differences betw^een it and the recorded obseiwations is a mini- 
mum? Or Avhat is the height such that the algebraic sum of 
the differences betAveen it and the recorded obserA'-ations is zero? 
The ansAver to this question aaoU be the arithmetic mean. 

4. What is the height H such that the product of the ratios of 
the recorded observations to H is unity? The ansAA^er to this 
question aauII be the geometric mean. 

5. If several rates of speed AA^ere giAmn in miles per second, Iioaa' 
manj’’ seconds on the average aaIU be required to travel 1 mile? 
The ansAver is the reciprocal of the harmonic mean. (Heights 
could not be used as an illustration because the harmonic mean 
is significant onlj'’ in the dual-Amriable tj''pe of quantity expres- 
sion, as explained aboA’’e.) 

The term “average” is a generic term, and any one of these 
summaiy figures may be called an average; the decision as to 
Avhich average should be used depends upon AA^hat question is to 



180 A\ ithSIS Ot fRtQVt\C\ I)lSTll!nUTIOM> 

be an‘?%\ eretl If the mednn hoiglit i'3 know n ■nid the height of a 
man is greater than the median, it can he infeircd that the man is 
taller than most men If a man’s height is equal to the mode, 
it IS known that he has the meet common, or usual, height If 
height IS analj zed m the abstract, as it might be in research on 
tlie effects of heredity and enaironment, the arithmetic mean is 
likelj to be used, for such anal>ses ordmanlj involve the solution 
of problems m mathematical terms 

MOMENTS 

Definition In phjsics, “moment" is a measure of a force with 
respect to its tendenc\ to produce rotation The strength of 
the tendencj depends on the amount of force and the distance 
from the origin of the point at which the forcfe is exerted If a 
number of forces /’j Fs F» at distances A’l, Xj, x\, 

are applied, the moment of the Brst force about the origin is 
Fi\i, the moment of the second force is FiXt, etc These 
moments are additu e so that SFA is the total moment about 
the ongm If the total moment is divided by the total force, 
the quotient is termed “a moment coefficient " The formula is 
i-FY/W where X = 2F is the total force 

It will be recogmzed that the formula for a moment coefficient 
is identical with that for an arithmetic mean This identity 
has lead statisticians to speak of the arithmetic mean as the 
“first moment about the ongm" Technically the mean is a 
moment coefficient and not a total moment, but in the case of 
freqiiencj curves, with which mathematical statistics is primarily 
concerned, the total frequency N is generally taken as umty,* 
so that the total moment and the moment coefficient are identical 
In anj case, it has become customaiy in statistics to speak of 
the mean X = 2FX/X as the first moment about the origin, and 
the distinction between total moment and moment coefficient i-? 
Ignored 

The concept of moments is also extended to higher powers 
Thus in statistics IFXyX is termed the “second moment 
about the ongm," and XFX^/N is called the “third moment 
about the origin," etc In general, the moments about zero 
are as follows 

‘ S«>e pp 276-277 and Appwidix, TaWe V I 



182 Ot HthQVb\C\ DIbi HlliUl 10\S 

mean are mtermediarj \alues iiseful for calculating measuies 
of \anab 1 l 1 t 3 ‘.kenness, and other characteristics of the frc- 
quencj distnbutiou Becani>e of their great convenience m 
obtaining mca'uires of the \aiious cliaractenstics of a frequency 
distribution the calculation of the first four moments about the 
mean may v ell be made the first step m the analj sis of a fre- 
quenej distribution Thus aaluable feature Mill be illustrated 
m the ensuing pages and m the next chapter Following are 
important generalizations concerning moments measured from 
the arithmetic mean for all frequency distributions, 

a# = 1 

ai = 0 

m - 

and m symmetrical dustnbutions, 

p, =» 0 
M4 * 0 
=• 0 

for all "odd ’ numbered moments 

VARIABILITY 

It Mas indicated above that the chief interest of statistics is 
m vanability, summary figures such as averages aie useful as 
points of departuie for further study of the frequency distribu- 
tion It may be noted that the pnncipb of averaging is funda 
mental throughout, for all the vanous methods of summanzmg, 
uhether it be central tendencies or variations from points of 
cential tendencies use the principle of averaging as a method of 
summarization or measurement 

The Range The most obvious method of measuring vari- 
ability IS to take the difference between the highest value and the 
loMest value, this difference is called the "range ’’ Thus m a 
set of several hundred grades, if the highest grade is 92, and the 
loMcst 13, the range is 92 — 13 = 79 The range is easily 
understood and easj to compute, but it la dependent entirely on 
the tuo extreme items It is therefore seldom used as the 
measure of \ariabihty Mhen accuracy and stability of results are 
‘Road 'sigm V square ’ 




■fZ 'd ‘(8261) POHPK V>^mms ‘-'-■vicnax ‘-tanas 'iD : 

•sjSAjauc Suiidnnjs nt aSu'S-t aqj jo asn aqi. 

•'01 ‘ 962 -f 62 ■d'l ‘sot]s}jn}g Hxnidttws ‘uBonnQ pun qpnig ‘jaAa.Moq ‘aag t 

-•easBO paAjasqo no 

pasBq PI iC^^qiq'BUBA jo aansnaui v sx uoi:j'Bpvap oSaiaAT? aqi 
q-nq^ p8:jou aq o; st •nutpaui aq^ mojj pains^ain sr noi:}t?tAap 
aSuia.v'B aq^ f^aq:) a:}BDipui 05 'd'Y 0:1 i^duosqns v sa pasn sr Tj\;^ 

f»6T) - • a \ 

SI naipatu aq^ inojj noij'Bptap aSajaAB 
aq^j joj 'Bjnra.ioj aq; uaqx ~ X j‘pa:i.ndmoo 

OS' uaq.vt ^saa^ si ^q aauis n'Bqiam aq:; tuoaj pa.tnsBara ^?q'Bnsn 
SI uj •n'caiu ai^auiq^U'D aqq rao.ij sb qa.tv sb ‘apora aq5 

.10 nBtpaiir aq5 raojj paansBanr aq pjnoo uopBiAap aSBJtaAB aqj, 

'X ~ X — ^ a.toq.w 

fei) = ^-a-y 

^ ■ 19 -Z -lo ‘-sV S' 

uoi^BiAap aSBjaAB aq^ aouaq puB ‘(ojaz si qaiq.w jo auo) SU015 
-■BiAap uaAas a.iB ajaq; a[duiBxa siq^ uj -ojaz si mns .iiaq^ ‘asui 
-aaq^o — ^nSis 0; paB§ai ^noq^iAt papp-B aq 05 aABq suoi^BiAap aqj, 

81 ° |g|g 

6 - 

6+_ 

f 
8 
Z 
0 

s— 

8 - 

X 

(9 = X) 

araj^ oqj moi} suotjciAafp 

:SAVO[[OJ 

SB pa^Tu^sniii aq abui it^qiqBUBA jo ajnsBatii sn^j, '-Couapua^ 
]Bj:5uaa Jiaq5 uioaj B^Bp aq^ jo suoijbuba aq5 jo aSBjaAB oi^auiq^uB 
aq5 St ("d’Y) noi^BiAap aSBjaAB aqj, 'uoijmaaQ advimy 3 UJ, 

^ -sasBO aq; jo 'qB 50U q ‘oAt; ^snt nBq; aiora no (^napnadap 
St 5Bq5 ^^qiqBUBA jo ajnsBatn amos asn 05 .la^^aq si ;]; 'pajisap 


01 

6 

S 

9 

f 

8 

Z 

X 


881 


XOSI7ir<IIVOD (IKV NOLlVZWrKimS 



18-1 


n Of DtbrHIBUlIONS 


In the feregeing example each \ occurred only once and so 
the f s A\cie all vimtx ^\^len there is more than one of nnx \, 
the formulas aic 


IDx 

\D„, 


JV 

N 


Standard Deviation Ihc most gcneiall^ used measure of 
xanabiliU howevci is the stoiidaid dexntioii This \\iU bo 
leadilv understood when it is been that the stand ud deviation 
IS easil\ trcitcd m ithcm iticalK the axtiagc deviation has 
\erv distinct limitations in this icsiKct oi\mg to the disrcgaid 
of jdus and minus signs In the ease of tlio standaid deviation 
this defect is o\cicomc by sqiuiing the deviations before thev 
vio aveiaged and then taking the square root of the average 
The symbol foi the standard deviation is the small Gj eck letter 
read ‘sigma Bj definition Oc standard dcuation u the 
equate root of the aicrage of Ike squared denahone from the mean 
Symbolic vlh 

^ = 03 ) 


ks maj be seen bv comp u mg this definition with the definition 
of the moments about the mem jrqs (9)) the standaid devia 
tiou la the square loot of the isccond moment i e 

ff ™ ji' (13a) 

hollovving IS an illusti ilioii of the computation of the st vndaid 


del latioii 

Dev I tlions froni ll c 


\ in blc 

M n (t = o) 

Dev lalions Squared 

\ 



1 

-1 

lb 

2 

-3 

9 

3 

-2 

1 


-Z 

1 

0 

0 

0 

6 

1 

1 

7 

2 

4 

8 

3 

9 

9 

4 

16 


^x* - CO 



■ciz -d 'jooid joj ‘-fo ( 


•Aouanba.ij ^s3:j'B3j3 jo jmod aqj ino.ij ji j[nd opis auo 
uo suiaji JO .laqranu .xaSaBj oqj ootiis ‘pa-vvgqs B.ns j'Bqj suoijnqiaj 
-sq> ut apoiu oqj uio.tj avskv pajpid oq iiBipatu aqx 'Z 

•paAvaqs si noijnqujsip jou .to .laqjaqAi 'Aonanba-ij 
jsajT?a.iS jo jurod aqj jb inBTiia.T ppA ‘uoijnigap Xq ‘apotu aqjL ‘I 

j'Bqj pajoadxa 

aq p^noAV ji qCpttanbasnoQ -ji inojj stioij'bi.iba aqj jo sapnjiugBUi 
aqj puTT jaqTiinu aqj qjoq jCq pajoap'B si iream aqj, ■sapnjni 
-Stnn -iiaqj iCq jon jnq 'shotjultba jo aaqranu aqj ^tq pajoaj}-B si 
iiBipaui aqx "JJ Avo]aq .10 aAoq'B suoij'ci.reA jo .laqinnu .10 s-aprijra 
-3 bui aqj jtq pajoajj’c joii si apoxu aqj juqj papnaa.! aq ppu ji .loj 
lopoin aqj puB ‘nBipara aqj ‘u'eaui aqj uaaAVjaq diqsuoTj'B|a.i aqj 
Ar[ paanstsaui A[iST?a jsoui si ssatLAaqg • 3 poj\r dtp pitv ‘ump 3 j\[ 
'uvdj^ dxpi xiddcnpcp dnpuoijvp^ fiq pdmsmj^ ssdnaidxfg 

•a^q-B-nsap 

-p.iooaB ajB ssaiLvvaqs jo sa.msBaj\;; -sasBo prippriptn jo uoijnqijj 
-sq) aqj ui jOjamuiAS jo qaBj jo AJjarauLiCs aqj nodn spuadap 
saSBjaAB aqj jo aaosagiuSis aqj ‘noijippB hi ‘jnq fnoijBiAap 
pjBpuBjs JO TiOTjBp\.ap aSBjaAB aqj qjLw HosuBdxnoD ni pajapisuoo 
uaq.A paggiOHi .C^qBjapisuoo aq iCBin oSbioab hb jo aauBogpjSis 
aqX •(noijBpvap pjBpuBjs pnB uoijBpvap aSBjaAB) ATIJCI'b 
- iJBA Sm.tnsBaiu joj spoqaara poB (saSBjaAB) satouapuaj jBJjuaa 
SuunsBaui joj spoqjain pamaanoo sBq noissnasip aqj jHiod 
siqj oj drj ^/suoijnqujsip paAiaqs,, papBa a.iB suoijnqiJjsip 
qong 'aanB-TBaddB in jBoujauraiAsB aJB qBad aqj tao.tj noijoajp) 
•laqjo aqj ui UBqj qBad aqj raojj uoTjoa.Tip auo ni jtbj .laSnoj b ui 
Suijpisaj suoTjBLiBA auiojjxa aABq jBqj suoijnqijjsip Aouanbajj 
•A’jpaiuuijtsB suBaui „ssau.\vaqg,. •9ovvoiJnidi^ puv uotjnnfiQ 

ssajMjiiaxs 

j’luiod jaqjo ituB uio.ij pajBino|Ba uaq.u UBqj UBara aqj luojj pajB| 
-ua^BO uaq.w japBuis si juaiuoui puoaas aqj_ -[(0) -sbg; ui siqj jo 
iioijiugap aqp aas] UBaiu aqj jnoqB juauioui puoaas aqj joj aiuBu 
.laqjouB ^jajaiu si aauBi.iBA aqj a’oauBUBA,, 

aqj papBa si uoijBuvap pjBpuBjs aqj jo ajBnbs aqx -douvuv^^ 

•suoijBpi.a]Ba aqj ojui .lajua jou op ajoj 
-ojaqj puB a'jiuu j|b ojb s^^ aqj juqj ajaq pajou aq jjiav jj 'S9‘Z 
•'o ‘‘(rro) uotjBiAap pjBpuBjs aqj uoijBJjsiqp aAoqu aqj uj 


eST 


.VOS'PJVr/KOJ axv XOT.LVZinVKKIlS 



18b 


lVl/r6/8 Ot IRtQUbWCY DlSlUlItU 1 10\6 


3 The moan m such distnbutiona will bo imlled awaj from the 
mode still farther since the larger number and extreme magnitude 
of the items on one side pull it farther away from the point of 
greatest frequency 

These points are illustrated m the followang frequency dis 
tnbutions * 


1 SraWSTRlCAL 

\ r 

t 3 

2 4 

3 G 

4 9 

. 10 

r a 

7 ( 

5 4 

a 3 

o4 


2 Skt,w»b PosiTnrLY 3 Skvw: 
\ # V 

1 6 I 

2 8 2 

3 9 3 

4 S 4 

a 7 j 

6 ^ C 

7 3 7 

8 2 8 

0 J 9 

49 


IStGATlVELY 

/ 

1 

2 

3 

7 

9 

10 

S 

b 

ol 


? ■ o 
Ml ■ 0 
Mo *• £ 

« - 2 OS 


( 2 ) 

? - 3 92 
Ml - 3 69 
Mo - 3 
<r - 2 05 


(3) 

t - 0 10 
Ml ■* 6 S3 
Mo - 7 
or « 2 01 


Figuics 64 to 66 show m graphic form the frequency dis 
tnbutions given on this page The relationship between the 
thiee a\ erages wall be moio clearly visualized from these figures 
In Fig 64 for example, all three aveiages equal 5 In Fig 65, 
which IS the positi%eI> skewed frequency distnbution, the three 
differ from each other the mode is 3, the median is 3 69, and the 
mean is 3 92 The extreme xanationa toward the higher aalues 
gi\e the frequency distribution a longer tad to the right, or 
toward the higher values of A and this pulls the median and the 
mean in that direction from the mode 

In Fig 6G a negatively skewed frequency distribution is 
liSiistrated liiis histogram ts a graptli of the third set of figures 
shown on this page In this figure, too the a\ erages differ 
from each other but here the mode is the largest The extreme 
\anations toward the lower values give the frequencj dis 
tnbutiou a longer tad to the left or toward the lower values of \, 

1 Calculations of averages wero made on the assumption that the r values 
arc mtd points of class intervals of unity 



p.i'cpu'G^s sssuAvaqs jo ^unouii! a’^v.Sg.iSS'c a.iT!duioo 

o; SI siq:> qSia.tt o:^ ooiAap auQ ^■^unouiu siq'^ si ^^xiDoyiuSis A\oq 
•jiiq ‘ssouAxaqs jo ^unorat! d^vSdjSSv aq^) jo a.insT!aui u si siqj, 

06'0- = OR - X Z6'0+ = OH - X 0 = oi/\[ - j 

‘sa^dtu'Bxa aAoq'B aq:j jojj -apoiu aq-j pu'B xroaui aq(j uaaAv^jaq 
aaua.tagip aqij iCq pa.msBaui aq iCiSutp-ioaou jt'Bui ssauAvaqg 


•iioixnquisip Aouanbaij poAvaqg A{aAilisod y — -gg -oi^ 



•itoiinquisip Aowanbaaj jBau^jeuimAs y — -^g •dijj- 



•org SI uBaoi 

aq:j poB gg'9 SI UBipaui aq?^ ‘2, st aporu aq(} a[iqAV os fapoui aq:j 
XIIO.IJ uoi^oa-iip ni xreaxu aq^ puB uBipaxu aq;j spud siq!> pun 

Z8T NOSniYrIKOO (7jVF M01J,VZJyYIVT\!nS 


1S8 


l\ ILibli) OF mkQLLMi DISIRIBUTIOSS 


de\iation and thtrcbj to obtain a coefHcieat of skew nos 
follows 

For ex implc (2) 


sk 


\ - Mo 


0 92 
2 05 


0 45 


5, OS 


(U) 


I or example (3) 



-0 90 ^ 
201 


I he relatne amount of skewncs'. or asj mmctij , in the-^c two 
divlnbutions comes out equal illhougli one is posituo \nd the 
other IS negatu c 



^Vnother measure of skewnc^ is ba'^ed on the medim ind 
the mean It has been cstabiislicd that in a moderately isjm- 
mttrical distnbiition if the mean is pulled a distance P awaj 
from the mode the median is pulled appro\imatel> two-thirds 
P aw i\ from the mode in the same diiection that i'» 


\ - Mo = 3(t - Mi) 

Ilcncc skcwiicas can al»o be measured bj three tunes the di>-t inco 
between the mean and the median, as follows 

.K . (U») 


Ihe second equation has the ad\ inlnge o\eJ (14) lu that the 
incdt in is often easier to locate than the mode Ihe mode is 



zzz'o+ = 0& - no - uv: - 

‘X{3u]p.ToooB iTi^tpaiu .t8S.i'B[ 

SI appitjnb p.Ilq:^ aq'^ qaiq.w Aq '^unouiB aq-j utjq^ ssaj .Ciqtjiapis 
-uoo ^junouiT! ut? Aq u-Gqiam aq:} uuq} .lantsms st ap}JT;nb }sjg 
aq} }-8q} uaas si }t 'asao pa.vvaqs XpAi}isod aq} ‘gg "Si^ xuo.i^ 

0 = im - *5 + ’0 

ua}}iJAV aq jCbui siq} ‘paSuF.i.i'co.t a.it; siuaa} aq} jj 

0 = C0 - IK) - TK - 'D 

bC^3uip.TOOOB fxiBipaiu aq} tuo.ij }UB}Sipinba a.re ^5 puB 
— ireipam aq} uuq} .laSaui si aipjBnb paiq} aq} }'Bq} }uiioiiiB araus 

•uopnqiJ^^^ip iBOii'jouiuiX? 

V jo iiBtpaui ai{; put: sap^jiunb p-tpp puu oip uoa.it^Dq uopupy ^ — ‘ig *oijX 



aq} itq ireipaui aq} UBq} aapBins si ap}Ji!nb }saq aq} iioi}nqpi}sq) 
\i3aT.i}auiui^s aq} m }i?q} tiaas si }t "Si^ uio-tjj; •(pa}'B}na{'Ba 
.?pi3a.qi! a.ii! sunqiatn aq}) sap'pi'enb p.iiq} puB }saq aAi}aadsai 
•iiaq} Supiqiioino pun a.voqt; pasn suoi}nqi.i}sqj aajq} ara'BS 
aq} Suiq-B} i^q pa}i3.i}siiqi aq u-bd siqj^ 'sap}.!!?!!!} pnu xiBipaui s}i 
}0 uoi}nqu}Sip Aouanba.ij b hi noi}BOO[ aq} 3ui.iBduioo Xq pauiB}qo 
SI ‘papuaqa.tduroa .tjisBa si }Bq} auo puB ‘ssaiLwaqs jo aansBam 
a[diuis Y -sditj-mriQ puv suvipoj^ aip fiq psunsvaj^ ssduaid^fff^ 
•{BA-ia}!!! ssBjo aq} JO iioi}oaias aq} iiodn ^Cja-iara juapuadap uajjo st 
iioi}BOO{ sji ‘noi}rppB ui .‘SaqduiBS jo snoi}Bn}onp apiAv o} }oarqns 
SI puB uoi}nqii}Sip apIuiBS v m a}BDO| 'o} }pioq}ip iti}uanbai} 

G8T jVO^'UfVrTIVOD UNV XOTJA'XJnVKKnS 



1{KJ iVALl&rS OF FREQUESCl DiSTIi[B( TIOVS 

From Fig 69 the negatnely skewed distribution it is seen 
that the first quartile is smaller than the median by an amount 
considerably lai^er than the amount by v-hich the third quartile is 
larger than the median for Qj — Mi ^ (Mi — Qi) = —0 154 



o CS — R«Ut on b«tn'««n the first nod third quart ly aud th« n edtan of a 
posit ety skewed d str b t on 



Fig b9 — Relat on between the first and th rd quart los aud the n edian of n 
negati ely skewed distr but on 

If the location of the quartiles compared mth the location 
of the median m each distribution 13 now compared wuth half 
the distance between the ttvo quartiles (i e the average di^i 



0 = 

89 = 

■ 0 

tl 


86 


f9 

8T 

9 

8 

z' 

L 

fZ 

Zl 

9 

Z 

8 

9 

f 

f 

f 

T 

f- 

e 

0 

0 

0 

0 

e 

I 

f - 

f 

f- 

T- 

f 

8 

fz- 

Zl 

9- 

z— 

8 

Z 

f-s- 

8T 

9- 

8- 

Z 

I 


z^d 

^d 

X 

d 

A' 


KOIlfiaTHX->-I(I aVJlttMKKXg "I 


■:^no :q.iOAv 'jnapu'js 
o:^'^^ja[ si uoi:jnqu!jS!p paAvaqs ifl^At:^B3aH aqj, -uoi^nq 
-ui^sip paAvaqs AiaAi-jisocl v pun inoLi^arauijts n Supvvoqs sajduinxa 
a^duris ^q pa:^n.i!^supi st siqj, -jCiaApuSau jo ^pA]:jisod paAvaqs 
St uoi:jnqtj;sip aq^^ jaq^aqAv Suipjoaon an^nA aAi^nSau jo 
aAiqisod n aAuq qpu. ^jnq ojaz o:} pnba aq qou qiAV ^uauioui pJiq!} 
aqi. ‘tnoiJ^jamuLfs 'jou si uoi'^nqu^fsq) aq'^ ji (g) -^nq fojaz aq qiAV 
‘inoiJ^auiin^s si uoi'^nqu^fsip aq^^ ji (j) ^nq”) ^}onj aqij 05^ anp 
si siqj^ -ssauAiaqs jo ajnsnaui pooS n os|b si :)uaraoui pJiq'^ sq^^ 
JO jooj aqua aqx 'ssauaisoig fo 9jnsv9j\^ v sv ^uauioj^ V-^HJj 

•juaa jad 9 ‘ 0 I— puB juao jad 9 ‘^X + 
ajB jCaqj ‘ssauAvaqs jo saSBjuaojad sb passajdxg; 'soijbj sb 
passajdxa 9oro— 9f'I'0+ Q-*'® ssauAiaqs jo sjuaio^aoo eqj 

‘iX[aAijDadsaj ‘59 puB 39 •s3ij[- uj -appiBub pjiqj aqj s|Bnba 
uBipaui aqj jBqj jBaj3 os si ssauAvaqs aAijnSau uaqAV 'si ^jnqj 
‘Z— UBqj jajBaj3 ou aq ubo jj -apjJBnb jsig aqj sjBuba UBipaui 
aqj jBqj. jBaj3 os si ssauAiaqs aAijisod uaqAV ‘si 'jBqj. ‘g+ uBqj 
jajBajS ou aq ubo jj -g— puB z+ uaaAvjaq sjuuq anjBA SBq 
ji jBqj a3BjuBApB aqj SBq ssauAvaqs jo juaioigooo siqj osjb jnq 
‘pug oj XsBa XqBUsn ajB asaqj asnnaaq auo poo3 b sappiBub puB 
UBipaui aqj nodn pasBq ssaiLuaqs jo juaiogjaoa aqj si iffuc jo^ 
•aSuBJ apjJBubuuas aqj 3utXjiu3is joquiife b sb pasn si ^ ajaqAv 


(ST) 


Z 

b ^ .'b-^b 

TIMS - CD - TIV[) - TIM - ^D 


iSAioqoj SB SI ssaiLviaqs sajusBaui 
jBqj B[niiuoj aqj puB pauiBjqo si ssaBLuaqs jo ajusBOui aAijBiaj 
JO juaia^aoo b ‘(uBipaui aqj uiojj sajipiBub oaij aqj jo aouBj 

T6T MOSmVdJVOD GNV NOIXVZIdVIVHnS 



192 

IVALYhlS OF fRbQUE\Cl DlbTRlBU i lO\b 



2 

PosITIVtLl 

SkbnLU DlSTniBtTIUN 



A 

h 

X 

tx Ai’ 


tx^ 

1 

6 

2 

-12 

2^ 

-48 

2 

8 

-1 

- 8 

8 

- 8 

3 

10 

0 

0 

0 

0 

4 

4 

1 

4 

4 

4 

0 

3 

2 

6 

12 

24 

G 

2 

3 

6 

18 

o4 

7 

I 

4 

4 

16 

64 


34 


= 0 = 

82 

•= W 

In the 

second example 

the third moment ii 

i equal 

not to 


zero bvit to M The measures of skewness b\ this method 
would be Symbohcalh this measure of ‘■keivness is 


It maj be seen from the dehnition of the third moment lEqs 
(9)1 that thijs measurement of skewness is the cube root of the 
third moment Expressed as a coefficient of skewness where 
the aggregate amount of skewness is m terms of the standard 
deiiation this measure of skewness is as follows 


The Bela CoeSktenle This last measure of skewness is of 
jmrticular interest not only becau->e it is based on a wholly 
mathematical procedure (it is not dependent on nonraathe 
matical summaries like the median and quartiles or the mode), 
but also because it is directly lelated to one of the so-called 
beta coefficients The beta coefficients are functions of the 
moments of the frequency distribution that haio been found 
\erj useful m de«ciibing and distinguishing vanous tvpes of 
fiequenej distiibutions ’ The two principal beta coefficients 
arc Pi and p* which are defined as follows 


It wnll be noted that the sixth root of pi is identically the coeffi 
cient of skew ne-is sk = iij/<r for pi = <r* Frequently \/^ is itself 
‘ Smith an I DuNcvs Sa npli ig SttHistta 



'SSI ‘V°V^K ridojim 9f/^ mi Sjipnjg '!)[ 'H w*ojj paijO 

'Sil 'd ‘(906T) f 'FA ./wp^of^a V ‘uoi:(mijba A\ 9?ig„c 

•uoipas ^xan aij} aag ; 
'ifil-I-Cl 'dd ‘soyysijDjg Diijiduivy^ ‘xvoxaQ put: Hiiivg 'fQ j 

•^i aquosap qouuuo aAjno 
u'Bissn'cr) aq!> snq:} pat: ‘(aAjno ikuijou aq:} q^i.A p^dclo:}-^^tqJ Apunba) 
oqjnqosaui aq O'} ]it;j A'bui }i qnq ‘p;au}amuiAs aq ‘spjOAt jaq}0 u: ‘Jivm 
tiot}nqu}Sip Aonanbajj y • • • -aAJno ]i:miou aq} uuq} padcfo}-}t!q ssaj 
JO aioin A|aAi}t!]aj aq Autu Aaq} ‘noi}t;iAap pjT:put;}s aq} Aq pajnsuaui sc 
A}[iiqt:iJT:A ainus aq} aA'cq qoiq.M snoT}nqii}STp Aouanbaij oav} uaAiQ 

j.; sAvoqoj sv uos-fBaj j.ibaj Aq paqr-iosap si STS0}.Tn;vi •imjiu'ifaQ 

sisoxan:a 


•sib04.inq jo ojiisBam ■b si }i }Bq} 
}ai!j aq} ui saq }uaioipaoo T;}aq puooas aq} jo aouB}.io(Imi aqj^ 

— = qs Bjiiiujoj aq} tuo.ij ssauAtaqs jo uoi}'B}nduioo aq} uo 


OK - X 

qoaqo qSno.i b sb aA.ias sasBO niB}.tao ui qiAv ig/ jo }oo.i a.iBnbs aq} 
qBq JO tioi}BpioiBO aq} ‘(ijt^ — x)S ~ X = oj'C 'Bjnui.ioj a}Bui 
-ixojddB aq} Aq pa}B[no{BO si .lajjBj aq} naq^^^ -aA.mo UBraos-rBaj; 

pa}}y aq} jo }Bq} 3maq apora aq} aoiAV} o} jBnba 

Aia}Buirxo.tddB si ig/V }Bq} SAioqs uoijBnba }Sbi siq} .‘uoijnqiJ} 
-sip jBUiJou B JO asBo aq} iii si }i sb ‘g o} jBiiba Aja}BrapcojddB si 
uaqAv puB ‘}qSqs si ssaiLwaqs aq} uaqAv ‘-a-j ‘qBins si ^g/ naq^v 

•ssauAvaqs jo a.insBaiu b sb bjihujoj aq} jo asn aqBui 
0} pa}}g aq o} aABq }oa saop 'as.inoo jo ‘aAino aqjg -ssaiLwaqs 
SuLinsBaui joj Bpixujoj juaqaaxa ub si snq} ‘aAjno Aouanbajj 
q}ooius B JO Sui}}g aq} }ubj.ibav o} sb qons ajB B}Bp aq} uaq^V 


ai) 


(6 - - ^P9)Z 

(g -f ¥) 


= qs 


0} jBiiba 

j) . 

aq O} puiioj SI — ^ Aq pa.iiisBaui sb aA.ino siq} .loj ssauAvaqs 

aq} ,‘uos.iBa(j; jjbjj Aq padojaAap bjiiui.ioj b oj SuipjoaoB aAjno 
Aouanbajj b aqijosap O} pasn aju sjuaioqjaoo B}aq aq} naq^^ 
•pajBpiojBO SI 'juouiotu qjJiq} aq} iqoiqAV luojj sb uSis auiBS 
aq} uaAiS Suiaq }ooj ajBiibs aq} ‘ssauAvaqs jo ajnsnaiu b sb uaqB} 


g6I 


xosimranoD Qxr NOhivziuvn’jvny^ 



194 


Ot l'RbQbb\Ci DIi>l RIBU7 10\6 


The “normal curve” to which this quotation refers is repre- 
sented bv the equation 


y “ 


1 



«■ 


(18) 


Hecause this curve has arisen so fiequentK m statistics and 
because it has been used as a type with which to compare other 



FlQ 70 — Frequencj distributions with greater ai 4 with less kurtosis than the 
Dormat cur>e 


frequency cur\ es, it has come to be know n as the normal curve 
Also, since Gauss early recogmzed its importance it is sometimes 
called the Gaussian curve ^ 

As shown in graphic form earlier in this chapter (Fig 63), 
when there is a marked concentration of \erj small variations 
about the central tendency, the frequency cun e rises to a high 
peak, unlike the normal, or Gaussian, curve, which has a certain 
« Cf pp 294-295 



661 

pnB uopipuoD o^. :^09tqns ajqissod sb g2jb[ sb aq ppioqs 

'JT ^Bq^ SI iBiUar^m ssb ][0 aq:^ jo nopoaps aq:^ ut apinS .laqi^onY 

■anjBA-pnir aq:j o:^ arqBA m pnba ajB 
ssBp aq:^ ni sasBO pB ;}Bq:j paranssB aq ‘noi:|.Bpiapa in s.iojja snouas 
:^noq;pA ‘-Cbui :q 'ifnoqSnojq:^ pa:>Tiqu:}Sip X^xiaAa aiB lo •[BAaa:jTn 
ssB[D aq; JO jinod-ptni aqj jb ‘jdbj m ‘pajBjjuaonoo a.iB sasBO aq; jj 
•pna siqa ajBjqiDBj o; sb pajoajas os aq pjnoqs osjb sfBjuajai ssBp 
aq; jo jpnq ja.Aoj aqx 'ssBp aq; jo arqBA-pnn Jo jinod-ppn aq; o; 
jBTiba aia.A -Caq; ji sb sassBja aq; jo ano >tTiB o; panSissB sanjBA pB 
JO jnauijBai; aq; 'joj.ia snouas ;noq;pA 'ajqissod aqBut o; sb qons 
aq pjnoqs ;i ;Bq; si jB.vja;m ssBp aq; Snijoaps joj airui jBjanaS y 

^qain ub jo qjna; 

B jo ‘qaiq japBnb b ‘qatn qaq b ‘qani j aq jB^viajui ssBja aq; ppioqs 
‘si iBqx isdnojS qotn-fL^ jo ‘qoin-| ‘qani-f ‘qoui-j aq sdnoj3 
aq; ppoqs Tioi;Bj;snpi joj pajaaps BjBp aq; uj •paziJBUiums 
aiB BjBp aq; qaiq.w m sdno-i3 aq; jo azis aq; si ;i ‘spjoAV jaq;o 
tn ;noi;nqu;srp Aonanbajj aq; jo jiun aq; si jBAJa;ui ssBp aqp, 
■p.v^jui s^DiQ 3i[) 5ummud}d(j .tof sdin^J -pAJOjiii ssbjq aqx 

Hoixnaraisia Aowanbarai v ao hoiioahxsmod 

‘I I siq^I, TO b;bp 

aq; jo noi;B;tiasajd jo jannBOi aq; q;L\v pajBdtaoo uopazuBiininis 
JO poqjara b si as jad uoi;nqu;sip ^fauanbajj aq; joj fiioi;nqu;sTp 
aq; jo uoijaiujsuoD aq; ui pasii aq o; |B.\ja;ni ssBp aq; jo sjxuiq 
puB azis aq; no appap o; ;sjq AJBSsaoan si ;i BjBp aq; jo TioT;nqu; 
-s_q) Aouanbajj b aqBni o; japjo uj 'raopnBJ ;b pa;sq ajB jfaq; 
fnoi;nqp;sTp Aonanbajj b m paSuBjJB ;on 9JB b;bp aq; ajqa; 
aq; in pajnasaid sy \'i;isjaAiufj nojaam-ip; ‘nopBonpg; jBDiSi?qj; 
pnB q;jB9g[ jo ;uatn;jBda(j aq; jo spjoaaj aq; tnojj painB;qo aiaAV 
Sf6T JO HBinqsajj aq; jo sjaqtnaia 008 Jo saqoni in s;q§iajj 
'TI ajqBX in pa;nasajd ajB sisajbub noijnqujsqi u^3nanbaJJ 
a;Bj;snpi o; pajaaps b;bq; -nopnqpjsia; ifonanbai^ b joj bjbq 

SISATVMV 

HoiiJiaraxsia-AOHandarad; jo MoixvaismTi 


JiA 'aaxdrv'HO 



200 1\ 1/>a;A 0/ J'BEQVhACi DIi>TmiiUTIO\;i 

to the condition that the interval should not be so laige as fo 
conceal too much of the cliaracter of the variability Indeed the 
most important purpose of the class interval is so to summarize 


Table 11 — HnoiiTs 300 £iguti,bn tcar old Members of Tin. Class of 
1943 Princlton Umvirsity 
(In ind cs) 




•p3|i3aouon 


aq Xbui 'Uijup oq-j tn soi(}i.it’[nSa.ui !}iiBoyiu§is ‘iiasoqo si i'ba.io^ui 
ssup 8S.re[ iCiaA u ji fps-ins-Baiu a-re sssbd jo .laquinu a§.tt!j XiaA 







202 A'^ALYSIii OF bHLQUhNCY DISFRIDUflONii 

OrdmanJy, the size of the dass mtcnal ^should be uniform 
throughout, because differentrsized class mtervals will compli 
cate calculations In some cases, ho^\ever, it is necessary to 
use different*s32ed class intervals in older to give a proper picture 
of variability 

If other more important rules are not thereby violated, m the 
interests of simplicity tlic position of the class mterval m the 
range should be such that the limits of the interv als are integers 
or such that the mid-vaiucs. of the class intervals are integers 
"Where marked concentiation about certain values exists, as is 
sometimes the cu'^e in dealing with discrete data, these values 
should so far as possible be made Uie mid points of class mterv als 

An Array of the Data Intelligent determination of the ch^s 
mterval is aided bv studj of the data arranged m an arraj or 
scatter diagram such as Fig 71, which is presented to illustnte 
the determination of the proper class mterval In the figuic, 
the heights shown m Table 11 are arranged m an array Because 
inspection of the data in Table 11 led to the suspicion that con* 
centration pomts wore present at the i-, i-, and |-mch values, 
the arraj is presented m rows with these concentration pomts 
plumbed Summing the columns as well as inspection of the 
detail of the scUter dngram show the concentration of fre- 
quencies at these values 

Frequency DiilnhuUon mth Too Many Class Intervais Exami- 
nation of Fig 71 suggests that a f-inch class interval beginning 
at 61 875 Indies, as shown m Table 12, might be a good class 
interval for the data of Table 11, for the ^inch class interval 
with the lower limits as slmwn in Table 12 places the mid values 
of class intervals at pomts of concentiation Such a frequency 
distribution contains over 00 lows, however, and, in addition, 
IS uneven and irregulai m appearance Ten frequencies occui 
m the interval 66 875-, only five frequencies occur in the interval 
68 625-, twelve frequencies occur in the intervals immediately 
bdow and above 68 625- ^loreover, it is not clear whether 
the modal class interval is 69i, 70|, 70 t, 71, or 72 inches, because 
an equal number (15) have each of these five heights 

The -j[-inch class mterval is too small m this instance to dis- 
close clearly the natuie of vranation m fieshman heights 

I Latgcr Class Inlerial Reieais Ute Character of Vanaiion If 
1 inch is taken as the cliss interval, the fiequency distnbution 



FREQUENCY-DISTRIBUTION ANALYSIS 203 

Table 12. Frequbnct Distribution of the Heights of 300 Princeton 
Freshmen, Class of 1943 


Heights of freshmen i 
X 

Number of freshmen having 
specified heights 

F 

Interval 

Mifl-value 

61.875- 

62.00 

1 

02.12&~ 

62.25 

0 

62.375- 

02. 50 

0 

02.625- 

62.75 

0 

62.875- 

63.00 

1 

63.125- 

63 25 

0 

63.375- 

03.50 

1 

63.625- 

63.75 

0 

63.875- 

64.00 

0 

64.125- 

64.25 

1 

64.375- 

64.50 

0 

64.625- 

64.75 

0 

64.875- 

65.00 

1 

65.125- 

65 25 

0 

65.375- 

65.50 

1 

65.625- 

65 75 

2 

65.875- 

60 00 

1 

06. 125- 

06.25 

1 

66.375- 

66 50 

6 

66.62.5- 

06.75 

3 

66.875- 

67 00 

10 

67.125- 

67 25 

4 

67.375- 

67.50 

12 

67.625- 

07.75 

5 

67.875- 

08.00 

9 

68. 125- 

68.25 

6 

68.375- 

08.50 

12 

08.625- 

68.75 

5 

68.875- 

09.00 

12 

69.125- 

09.25 

9 

69.375- 

69.50 

15 

69.625- 

69.75 

11 

69.875- 

70.00 

12 

70.125- 

70 25 

6 

70.375- 

70.50 

lo 

70.625- 

70.75 

15 

70.875- 

71.00 

15 

71.125- 

71.25 

10 

71.375- 

71.50 

9 

71.625- 

71.75 

8 

71.875- 

72.00 

15 

72.12.5- 

72 25 


72.375- 

72.50 

12 

72.625- 

72.75 

6 

72.875- 

73.00 

7 

73.12.5- 

73.25 

5 

73.375- 

73.50 

5 

73.625- 

73.75 

4 

73.875- 

74.00 

6 

74.125- 

74.25 

3 

74.375- 

74.50 

3 

74.625- 

74.75 


74.875- 

75.00 

1 

75.125- 

75.25 

3 

75.375- 

75.50 

1 

75.625- 

75.75 

3 

75.875- 

76.00 

0 

76.125- 

76.25 

1 

76.375- 

70.50 

1 

76.625- 

76.75 

0 

76.875- 

77.00 

0 

77.125- 

77.25 

1 



300 



201 i\ XLYUl'i Oh hNl-QlftACl JUlSUTIOMi 

will contain 17 classca and ivill appear as shown in Table 13 
In tliLS {re<nicncy distnbiition the lower limits of the class inter- 
vals arc so chosen that mid-values arc at the 0 G25-mch points 
(I inch), which is a balancing center of the concentration points 
at the i-inch infen als because at | inch each mid-v altic has two 
^inch concentration points below it and two above it in the 



» I inch ) 



1 inch class interval 'Ihw balancing position of the l-mch 
points can be readilj seen by an examination of Fig 71 

In order to contrast the iiregulantics m the frequency dis- 
tribution using too small a class mten al with the regular appear- 
ance of the same frciiuency distnbution using a larger class 
interval, Figs 72 and 73 are presented Figure 72 is a graph 



FREQ UENC Y-DISTRIBUTION ANAL YSIS 


205 


of the frequenc}^ distribution of heights of 300 Princeton fresh- 
men, using a i-inch class interval. Figure 73 is a graph of the 
frequency distribution of heights of 300 Princeton freshmen, 
using 1-inch class intei’val. 

The argument for a class interval centered at the |-inch point 
has been based on the assumption that measurements have been 
made to the nearest -y inch. In other words, a height recorded 
as 64.25 might be anything between 64.125 and 64.375. If 
measurements were alwaj'^s made to the lowest i inch, then some 
other mid-point would be warranted such as the ^-inch points, 
or integral values. Table 14 is one based on this assumption. 
Since the exact method of measurement is not known and since 
Table 14 is simplest in form, it is adopted for subsequent analysis. 
A graph of the distribution has already been shown in Fig. 71. 

In frequency Tables 12 to 14, the class interval has been listed 
in two ways. (1) It has been described by writing on each line 
the lower limit of the class inteiwal, followed by a dash. (2) It 

Table 13. — FKEquBNCY Distribution of the Heights of 300 Princeton 
Freshmen, Class of 1943 


Heights onioshmen 

Number of f) eshmen having 
specified heights 

Int-erval 

Mid-valuc 

F 

61 . 125- 

61.625 

1 

62.125- 

62.625 

1 

63.125- 

63.625 

1 

64.125- 

64.625 

2 

65.125- 

65 . 625 

4 

66.125- 

66 625 

20 

67.125- 

67.625 


68.125- 

68.625 

35 

69.125- 

69.625 

47 

70.125- 

70.625 

51 

71 -.125- 

71.625 

42 

72.125- 

72.625 

27 

73.125- 

73.625 


74.125- 

74.625 

9 

75.125- 

75.625 

7 

76.125- 

76.025 

2 

77.125- 

77 625 







20G l\ALYSISOf IREQUhNCY DlSTRIBVnO\S 

TASLi, J4 — FutOUEACy CjeTAIBUTIOV Of THE HEiCUTS or 300 Pbincekjv 
Iklshmen Cusb or 1043 


Tie gl U of f enh ni 

No Icroffrexln In ng 

1 t>-r til 

W 1 alue 

F 

62- 

62 5 

1 

63 

63 o 

2 

64 

64 5 

1 

6o 

&> 0 

4 

66 

66 -> 

12 

67 

67 o 

31 

63 

63 5 

31 

69 

69 5 

47 

70- 

70 a 

48 

71 

71 5 

12 

72 

72 5 

3j 

73- 

73 0 

21 

74- 

74 5 

14 

7o- 

75 5 

8 

76- 

76 0 

2 

77- 

77 o 

1 

300 


has been described by writing in the next column the mid-value 
Obviously, both methods of descnbing the class mterval need 
not always be emplojed, the conventional piocedure is to use 
the lower limit descnption rather than the mid value desenp- 
tion The mid-value can alwajs be calculated by adding one 
half the class interval to the lower limit of the class mterval 

WORK SHEET FOR FREQUENCY DISTRIBUTION ANALYSIS 
The frequency distribution having been constructed, the 
procedure for frequency-distribution analysis will now be 
desenbed Table 15 is a work, sheet for the analysis of a fre 
qvieney m wAvinma anil ^2), vindeT X and F, 

is copied the freiiucncj distribution from Table 14 Entries 
in the lemainmg columns will be explained below The work 
sheet IS so constiucted that advanta^ may be taken of certain 
economies m calculation These economies anse from two 
‘50urtes (1) tlu leduction m calculation due to the use of a short 
method that involves the calculition of the moments about an 



vhequemcy-oistribution analybis 


207 


“arbitrary origin” and (2) a reduction in calculation, due to 
tbe use of class intervals as units of deviation from the arbitrary 
origin. 

Saving Calculation by Obtaining Moments about an Arbitrary 
Origin. In applying the short method, an arbitraiy origin, 
Avhich may be called A, is selected. Wliile zero may be taken 
as an arbitrary origiu (and often is in certain statistical problems), 
in the analysis of frequency distributions the amount of calcula- 
tion is reduced by selecting a value for A somewhere near the 
middle of the range. The moments about the arbitrary origin 
are then calculated by measuring deviations from A in class- 
interval units, that is, in d/i’&. Sometimes d' is used to symbolize 

j- The savings in calculation are due to the fact that all desired 

if 

mathematical statistics can then be computed by the use of 
formulas from the four moments about the arbitrary origin. 

Saving Calculation by Using Class-interval Units. Saving in 
the amount of calculation to obtain the various statistics results 
if the class-interval unit Is used, particularly if the variable is in 
complex or fractional units or in large numbers. This economy 
is brought about by expressing the deviations in terms of class 
intervals instead of in original units, i.e., in d/i’s instead of in 
d’s. As pointed out above, this saving is augmented by selecting 
the arbitrary origin near the middle of the frequency distribution. 
If the’ arbitrary origin is at or near the middle class inteiwal, the 
largest deviation in terms of class-interval units w'ill then be no 
greater than half the number of class intervals in the frequency 
distribution. Since the deviations must be raised to the fourth 
power in order to calculate the fourth moment, substantial saving 
in calculation is secured by keeping class-interval deviations as 
small as possible by placing the arbitrarji- origin near the middle 
of the frequenc}'' distribution. It ^^dll be observed in Table 15 
that the frequency distribution has been copied on the work 
sheet in such a position that the arbitrary origin is near the middle 
of the frequency distribution. It can also be seen that, when 
the class interval is uniform in size, recording the clas.s-interva’ 
deviations in column (3) is merely a matter of proceeding by 
count above and below the arbitrar}'' origin, that is, —1, —2, 
—3, etc., for successive smaller cla.ss-interval values, and 1, 2, 3, 
etc., for successive larger class-interval values. 



1 \ALii>JS OF FJIFQUE\Cy DIS3 JilBUJ J0\ S 


20S 


Entenng the Fnquencf/ Distrtbulwn on tho Sheet Ihe 

freq\ienc\ di'>tnl)Utiou of fieshmen hctRlils ''hown in T'xble 14 
has lb cUss intciaala, and if the mid-valuL of the tential class 
interval is to be selected as the arbitral j ongin, the class 
mteival 62-, will be entered in column (1) under “Interval’ 
on the line opposite —7 m column (3) (d/t = —7) The remain- 
ing classs intervals will be enteicd m succeeding lines until 77- 
wili be oppoMte 8 m column (3) (rf/t = 8) The mid-value 
of the central class mterval is 69 5, which is opposite 0 m column 
(3) (d/t = 0) The correspooclmg frequencies are then cnteied 
in column (2) Full description of the data and their source 
Ls entered m the space provided at the top of the work sheet 
Calculation in Use of Tl^erk Sheet The amount of 
calculation involved in Ihe tntnos lecjuired for columns (4) to 
(9) can be reduced to a minimum bv the following procedure 
In column (4), headed F{<l/i), cntci the class-mterval devia- 
tions multiplied by the frequencies c , items m column (3) 
multiplied respective^’, bj items m column (2)] The algebraic 
sum of the figures in column (4), divided bv N, equals the 
first moment (m class-interval units) about the arbitrarv 
origin 

The figures m column (5) headed F{d/t)\ are obtained b\ 
multipljnng tlic itcnvs m column (4), rcspectiv el> , by the cono- 
•■ponding Items m columif (3) Ihc sum of figures in column 
(5), divided bj N, equals the second moment (m class mterval 
units) about the arbitrary origin 
The figures m column (6), headed F(rf/t)*, are most ea&iK 
obtained bj multiplving the items m column (5), rospoctiv el} , 
by the corresponding items m column (3) Ihc algebraic sum 
of figures m column (6), divided b> N, equals the third moment 
(m class-interval units) about the arbitrary oiigm 
The figures m column (7), headed F(d/i)*, are obtained bv 
miiltipbing the items in column (6), lespectiv ely, by the corie- 
spondmg items m column (3) The sum of figures m column 
(7), divided by JV, equals the fourth moment {m clas* mterval 
units) about the arbitrary ongm 


The figures m column (8), headed 




are obtained 


bv adding 1, re-^pcctivcly, to each figure m column (3) and raiding 
the result to its fourth power All figures in this column arc 



frequency-distribution analysis 


209 


readily obtained by using a table of powers of numbers. ^ The 
sum of column (8) is not used. 

The figures in column (9), headed F {i + ly, are obtained 

by multipbung the items in column (8;, respectively, by coiTe- 
sponding items in column (2). The sum of column (9)' is used 
to check the arithmetical accuracj'- of all calculations in the 
Avork sheet. 

When the work sheet is completed, it will show the following 
values : 


A. i, .V. iT (i). If (ff. i-f [ff. and Sf (0‘ 

In addition, by means of columns (8) and (9), the work sheet 
provides a cross check on its internal calculations, since the 

e.vpansion of ^ + 1^ gives the following terms: 

if (i)' + Aif + Oif (ij + 4if (0 + Vf 


It follows that on a correctl\' constructed work sheet the sum 
of column (9) equals the sum of column (7) plus four times the 
sum of colunm (6) plus six times the sum of column (o) plus four 
times the .sum of column (4) plus the sum of column (2). This 
is called 'a “Charlier check” after the name of the man avIio first 
suggested its use as a checking device. 

For Table 15 the Charlier check is as follows: 


S [column (2)] 
4S [column (4)] 
6^ [column (5)] 
42 [column (6)] 
2 [column (7)] 
Sum = 2 [column (9)] 


= 300 

= 4 X 292 = 1,168 
= 6 X 2,140 = 12,840 
- 4 = 5,590 = 22,360 
= 45,08 8 

= 81,756 


* Cf. Mathematical Tables from Handbook of Chemistry and Physics, pp. 
153- 173. For use in making calculations there are a number of convenient 
device.s such a.s the slide rule and calculating machines, as well as logarithms. 
There are al=o several useful printed table-s such as Barlow’s Tables of 
Squares, Cubes, Square-roots, Cube-roots, and Reciprocals of Integers up to 
10,000 and Karl Pear.son’s Tables for SleUisticiam and Biometricians; A .set 
of logarithms will be found in .4ppendL\-, Table I. 



210 


{\ALYSIS Oh tRhiiUENCY DlSllUBUTlONh 


Taslk 15 — Work Sheet tor Maxino C^lc01.atjovs jv the Asaiysis 
or A FnixiTJENct Dirtbiudtion 

DesjCRiption of Data Heights of 300 lYinccton University freshmen, 
Class of 1943 

Source of Data Princeton University’s Department of Health and Physi* 
cal Dducation 

t = 1 in 

A = 69 5 in (Mid-poiiit of class interval near renter of distribution) 







FREQUENCY-DISTRIBUTION ANALYSIS 211 

Moments about the Arbitrary Origin. The moments about an 
arbitrary origin can be quickly calculated from the sums of 
columns (4) to (7), because by definition the moments about an 
arbitrary origin are as follows: 

^Fd 
N 

N 

^Fd^ 

^ xY 
y^Fd^ 

N 


j>i = 

II, = 

Pi = 

P4 = 


^iFd" 

X " ir 

where X — A — d. 

If A were zero, d would equal X; and the moments would then 
reduce to the form shown in Chap. VI. 

When, as in Table 15, the deviations have been taken in class- 
interval units rather than in original units, the formulas for the 
moments about an arbitrary origin, would be written as follows 



fl ‘ ^ , 

where X — A = - (i), in which i is the class interval. 

1 Cf. p. 181. The prime on v means that the vis in class-interval units; 
i.e., >' = v/i, vj = pi/i^i etc. 



212 


liVl/lS/,S 01- IRFQUL\C1 DJi>I HlliVl I0\ S 


Accordingly, the moments m class mtcrv d units 'ibout an 
arbitrir^ ongm aie obtained from the sums of columns (4) to (7) 
of the 01 k sheet bj di\ iding each bj N [the sum of column (2)] 
In Table 15 the moments about the arbitrary origin m elass- 
mter\al units arc as follows 


292 

*’* 300 

2 140 
*■* 300 

- 5.590 

*’* ~ 300 

, 45,088 

- 300 


« 0 97333 


= 7 13333 


= 18 63333 
« 150 29333 


l/owic>ii5 about the Anthmelic Mean Wh;;n the moments 
about an arbitrarj origin aie obtained, the moments about the 
mean irc obtained from the following equations ’ 

Ui 


y'l — if'iVi + Gi'Ji'i* — 3/i* 


(2) 


The moments about the arbitrary origin having been calcu- 
lated for the frequency distribution of fieshmen heights m 
table 16, the moments about the antiimetic mean m class- 
mtci'val units may now be obtained by the use of Eqs (2), as 
follows 


*= 0 97333 ~ 0 97333 » 0 

= 7 13333 - (0 97333)* « 7 13333 - 0 94737 = G 18596 

= 18 6333 - 3(7 13333)(0 97333) + 2(0 97333)* 

= 18 6333 - 20 82924 + 1 84420 = -0 35171 
n\ = 150 2933 - 4(18 63333)(0 97333) + 0(7 I3333)(0 97333)* 

-3(0 97333y 

« 150 29333 - 72 54552 + 40 54740 - 2 09253 = 115 G02G8 


Equations (2) for finding the moments about the moan from 
tho moments about an arbitrarj ongm may he pro^ cd as follows 


m =» Mj = ui/«* w, = m/«* etc 



frequency-distribution analysis 


213 


Since, in Eqs. (1) for moments about an arbitrary origin ‘ 
i{d/i) = X — A, it follows that ’ 


F, i = FAX, - A) = FX. - F,A 

(t) ^ ~ 


F..(^)i = F,AX.-A) = FX. - F„A 

By adding, 

(a) SE i = SFX - NA 

since 2F = X. 

Because A is a constant, di, do, ^ d,, will vary in propor- 
tion as X\, Xi, . . . , Xn vary. Also, since A is a constant, the 
sum of the A’s may be written as the constant multiplied by the 
total frequencies, or NA. If now Eq. (a) is divided by N, 



But, by definition, 

— {•■) - »;(•') = 
and 

2FX 

— = X, the arithmetic mean 
N 

Therefore, by substitution and transposing, Eq. (b) becomes 

X = A + p'lii) or X = A + (3) 

Accordingl}’’ the arithmetic mean of the frequency distribution 
of 300 fre.shmen heights shown in Table 15 is as follows:’ 

‘ The result of calculation is 70.47333; but since the beginning data were 
significant to only two places beyond the decimal, the figure.s beyond .47 



214 


ANALi&iaOl hJiLQUiNCY Di&lRlBUlIOAS 


:? = 09 5 + 0 97333, since i = 1 
= 70 47 m 

It has thus been proved that the arithmetic mean equals any 
arbitrary quantity plus the first moment about that arbitrary 
quantity In other words, the arUhmdte mean of a senes of 
magmiudes ts equal to any arbitrary quantity plus the mean of the 
deviations from the arbitrary quantity From Eq (3) and from 
the fact that d = ^ it foUons that A =* X — d and that 
X = X — d + i»j Therefore, X — X = d — vi, and 

(c) a- = d — i-i 

or if d 13 m class-interval units, 



This value for x may be substituted m the equations defimng the 
moments about the mean, as follows ' 



arc not significant The manner m ithich the figures are \^ritten in Table 
11, which was taken from the source of the data, ladjcates accuracy to two 
decimal places Had the numbers been rounded off to the nearest inch, 
the calculated mean would have significant figures to the nearest inch 
Nevertheless if the value of the mean is to be used for making further mathe- 
matical calculations to obtain other statistics, it should be earned out to 
several more decimal places in order to give an accurate result to two places 
m the additional statistics 

^ For definition of moments about the mean, ef ,p 181 



FREQUENCY-DISTRIBUTION ANALYSIS 


215 


After expanding and collecting like terms, these equations become 



From values given for vi, vz, and Vi in Eqs. (1), Eqs. (5) may- 
now be ^vritten as follows: 


/ii = ri — n = 0 
fli — Vi — vj 

1^3 — Vz — 3v«vi + 

pn = Vi — 4vzvi + dv^yj — 3vi 

which it was said at the beginning of this section would be proved. 
An important corollary follows from the above derivation of the 
second moment (or “variance,” as it is sometimes called). 
Since 

^2 = ro — vf 

it follows that the mean square deviation about the mean of the 
observations is less than the mean square deviation about any 
arbitrary quantity; that is, the mean square deviation {a-) about 
the mean is a minimum — smaller than it would be if calculated 
from any other average. This is obvious from the equation; 
since is a positive quantity, being a square, m must be less 
than v-i. 

The Standard Deviation. The standard deviation about the 
arithmetic mean may now be quickly calculated, since it is by 



216 ANALYSIS OF fRhQUSVCY DlSl itlBUJ lONb 

definition tlie square root of the second moment For the 
frequencj distribution of heights of 300 freshmen the standard 
dcMation is as follows 

” = = 2 487 or 2 40 in 

Since the moments were calculated m class mterval units (set 
page 212), this lesult is also m class mterval units The standard 
deviation in original units is found by multiplying by t In the 
present problem, « = I, hence, tr — tr/i = 2 49 in 

The Beta Coefficients 1 or the frequency distribution of heights 
of 300 freshmen, the first two beta coefficients are as follows 

« 'll » 0 00052 

a « 302102 

A 

Since the betas are ratios having i laiscd to the same power 
in both numerator and denominator, the fact that the moments 
aie m class>inter\ al units instead of original units may be 
disregarded 

Measures of Skewness and Kurtosis Measuies of skewness 
and kurtosis are also readily determined from the moments about 
the mean In the frequencj distribution of heights of 300 
freshmen, the measure of kurtosis, *9*, calculated above, is 3 021, 
slightly larger than 3 Hence the frequency distnbution is 
somewhat less flat-topped than the normal curve ‘ 

Skewness in heights of the 300 freshmen, measured by the 
cube root of the tlurd moment, is —0 7057 class interv’als 
Since 1 = 1 in , this is —0 7057 mch 

CALCULATION OF OTHER STATISTICS 
Averages and Measures of Vanability Difficulties in Locating 
the Median and the Mode Consideiation of the median, the 
mode, and the quartilcs has been left to the last for the reason 
that, m the anaJvsib of fiequency distributions with class intei- 
vals, these values must be estimated By definition, the median 
is the value at the center of the distubutioii, the first quart ile 

‘ Figure 101 p 29o w a ginplt comparing llie frequencj distribiilion 
ith tlic ideal normal curve 



FREQUENCY-DISTRIBUTIOX ASALYSIE 21 7 

is the value midway between the lower limit of the range and the 
median, and the thii’d quartile is tlie value midway between 
the median and the upper limit of the range. The mode is the 
value that occurs with the greatest frequency. The calculations 
of these statistics are not based on the work sheet shown in 
Table 15. 

Because they are concealed in the class interval among a group 
of other cases in the same class inteiwal, the quartiles, the median, 
and the mode must be obtained by estimation. Where within 
the range of the class interval is the median? AVhere within 
the range of the class interval ndth the lai’gest frequency is the 
mode? These questions have to be answered by interpolation, 
and the value so obtained becomes an absti’act quantity — as 
ab.stract and mathematical in character as the mean, but without 
the latter’s precision. 

The Mode. In the case of the mode, a further difiicultj’’ arises 
in finding the correct answer to the question: ^^^lich cla.ss 
intenml should be considered to contain the mode? If different- 
sized class inteiwals are taken m each of several frequency dis- 
tributions of the same data, the modal class interval will be 
observed to shift around. The mode, by definition the simplest 
of the several measures, is actually the most difficult average 
to locate. Its accurate computation is more highl}'- mathematical 
than that of the arithmetic mean. If a Pearsonian cuiwe gives a 
good fit to the data, the ideal method of obtaining the mode is to 
find the mode of this curve. A formula for this is given on the 
next page. The disadvantage of this method is that there is no 
way of telling whether a curve is a good fit or not until it is 
actually fitted, and this involves a considerable amount of calcu- 
lation just for the sake of finding the mode. 

But simpler measures of the mode are often used. The.se 
are interpolated values, on the assumption that the mode lies 
in the modal cla'ss interval, that is, the class interval that has 
the highest frequency. It is assumed that the general shape 
of the distribution affects the distribution of cases at the point 
of greatest concentration in the following manner: All the fre- 
quencies below the modal class interval are pulling the mode near 
the lower limit of that class interval, and all the frequencies 
above the modal class interrml are pulling the mode toward the 
upper limit of the interval. The mode is equal to the lower 



218 


1/ lA/f OF FREQLE\C\ DlSTIUBl TlO\i> 


limit of the modal clast* mtenal plus the interpolated part of 
the class mterv al established by the rebtionship of the frequencies 
above and below that class interval In the frequencj distnbu 
tion of frc'-hmen heights (Tabic 15), the modal class interval, 
that la, the class mterv al vnth the greatest concentration of cases, 
IS 70- There are 129 cases pulling the mode toward the lower 
limit of the class interval 70- and 123 ca'^es pulhng the motle 
toward the upper hmit Consequentiv 

^lo — 70 d" X 1 /O 488 or /O 49 in 

The so-called ‘mathematical mode ’ an appi-o\imatioa of the 
mode of the Pearsoman curve that i> invalid if the frequenev 
curve Ls verv 'ikcwetl is calculiteil from the following equation ‘ 

Mo « V - 3(\ - Ml)* 

For the frequeuev distnbution of 300 freshmen heights the 
mathematical mode is calculated as follows 
Mo = 70 47333 - 3(70 47333 - 70 4376) = 70 366 or 70 37 m 

The mode of the Pcarsouian curve htted to the data is given 
by the formula 

Mo « Y - csi. 


slv = + 

2(oS. -W. -9) 

For the frequenev distnbution of 300 freshmen heights the 
mode calculated bv this equation is as follows 

Mo = 70 50 in 


The l/edia« and (he QuarltUs Detennmatiou of the median 
and the quartiles bj interpolation la leasonablj accurate if, as 
it IS assumed, the ca^es are evenly distnbutcd withm the class 
interval containing the median and the two quartiles respec 
tivelv The calculation of the median and the quartiles b> 
facihtated bj makmg a coliinin of cumulated fiequencies as 
shown m Table 16 The median is equal to the lower limit of 
the class contammg the AV2th case plus an mterpolated amount 
vvithm the class interval detemimed bv the latio of the fre 
‘ Cf p 173 

• The median u 70 4370 Cf the next section 



FREQUENCY-DISTItlBUTIOX AXALYSIE 


219 


ciuencies in the interval to the balance of frequencies necessary 
to make up N/2 frequencies. In the frequency distribution of 
freshmen heights (Table 16), N/2 = 150. The frequencies 
are counted cumulatively from the lower limit of the first class 
interval (top of the table). By this count, there are 129 cases 
to the lower limit of the class inter\'al 70-. When the point 
70 is reached on the quantity scale, 129 cases have been counted; 
but the median is the value of the 150th case, that is, 21 cases 
beyond 70. From 70 to 71 there are 48 cases. Consequently, 
the ratio of interpolation within the class interval is f?. Accord- 
ingly, the estimate of the median in freshmen heights is as 
follows : 


Mi = 70 + ti- X 1 = 70.4375 or 70.44 in. 


Table 16. — Frequency Distribution of the Heights of 300 Pri.vcktox 
Fkeshjien', Class of 1943 
(In inches. Class interval 1 in.) 


X 

F 

Cuiimlativc 1' 

02- 

1 

I 1 

63- 

2 

3 

64- 

i ^ 

4 

Go- 

! 4 ‘ 

S 

66- 

12 

20 

67- 

31 

51 

68- 

31 

82 

69- ’ 

1 47 

129 

70- , 

j 48 

177 

71- ! 

1 42 

219 

72- ' 

' 35 

254 

73- i 

21 

275 

74- i 

1 14 

289 

75- 1 

i s 

297 

76- : 

1 2 

299 

77- 

1 

1 1 
300 i 

300 


I 


1. There are 129 cases to X = 70. 

2. Since N/2 = 150.0, this leaves 150.0 — 129, or 21.0 cases to go, of the 
48 cases in the ue.xt class interval (70-71). 

3. The interpolated amount of the ela.s.s-interval range is therefore 
n X 1. 



220 


IWALYiyIi, Oi- tRLQOLSCi DISlltlBUl lOWS 


The third and first quortilcs arc calculated bj interpolating 
m a similar manner for the \alucs of the 3iV/lth and the A^/4th 
cases In the frequenej distribution of the heights of 300 
freshnicn, following arc the \alucs of the quartilcs 

= G8 + If X 1 = 08 774 or 68 77 in 
Q, = 72 + X 1 « 72 171 or 72 17 m 

Ihe Average Deviation The average, or mean, deviation 
18 a measure of dispersion that has its mmimum value when 
deviations arc measured fiom the median To compute the 
a\ crage dev lation from the median, subtract each of the N \ alucs 
of .Y from the median, add the absolute v alues of the deviation'*, 
and divide bv N Thus, 


llic aveiage deviation is simpler m concept than any other 
measure of dispersion It is less affected b> c\trcmo deviations 
than the more popular standard deviation, and for this i-cason 
It probibly has greater sampling lehabiht} fiom cvtremely 
leptokurtic unucr&cs In spite of these advantages the avciago 
deviation is not a popular measure of dispersion, partly because 
of several widelj accepted but mistaken notions concerning its 
properties 

It IS often said that it is illogical to neglect the signs of devia- 
tions to bo averaged and that this fallacy is avoided in the case 
of other measures of disi>ersion It is true that the mean dev la- 
tion from the median is the mean of absolute deviations from 
some a\ crage, but cv ery other measure of dispersion is also equal 
(or proportional) to an average of absolute deviations from some 
average The quartile deviation is the median of absolute 
deviations from the mid-quartde, and the standard deviation 
is the quadratic mean of absolute deviations from the mean 
It has been said that the samphiig reliability of the avenge 
dev lation is Ic^vs than that of the standard deviation This may 
be true for normal universes, but it can hardly be true for all 
types 

Grouped Data — Mid-talue Absumption in Calculating Aicrage 
Deiiatton hen data arc giouped in the form of a frequency 



FREQUENCY-DISTJilBUTION ANALYSIS 


221 


distribution with equal class inteiwals, the average deviation can 
be -written in the simple form 


A.D. 


i(^\Fd/i\) 

N 


( 7 ) 


where d is the deviation of class mid-values from the mid-value 
of the class interval containing the median. Although Eq. (7) 
is the exact value of the average deviation from the median 
according to the assumption that all observations in every class 
interval are equal to the mid-value of the inter^'a! (the same 
assumption commonly used for the standard deviation), man 3 '- 
statisticians consider it unsatisfactorj’- as a formula for the 
aA'erage deviation. The chief reason for the dissatisfaction seems 
to be tliat the mid-value assumption, Avhich implies that the 
median is the mid-value of the median interval, is inconsistent 
with the ordinary notion of the interpolated median. 

In applying the simple formula in practice, several corrections 
maj'- be irsed, some of which will be illustrated below. Each of 
these corrections deals with a separate aspect of the problem of 
approximating the average deviation of ungrouped data from a 
frequency distribution. The two most important corrections 
are usuall}’’ of the same order of magnitude, but opposite in sign, 
so that the}’- tend to offset each other. For this reason, it is 
usually adr’^isable to use the simpler formula without correction, 
because of its simplicity, unless the problem is of great importance 
so that minor adjustments are Avorth making. 

Grouped Data — Histogram Assumption in Calculating Average 
Deviation. The aA^erage deviation of the histogram considered 
as a continuous frequency function is often used in preference 
to the simple formula for the average deviation presented in Eq. 
(7). This corresponds to the assumption on which the usual 
interpolated median is based. The median is the abscissa of 
the vertical line that diAudes the histogram into tAvo equal 
areas. When the left half of the histogram is folded along this 
vertical line, OA^er the half on the right, the aA^erage deAuation 
is the first moment about the line of folding. 

To simplify the derivation, let d/i represent deviations from 
the mid-value of the median interval, and let 


Mi ~ L (d 


( 8 ) 



222 


IVlLKi/i Oh hRhQVhNCY DISTHIBUI lOhS 


wheie L is the lower lumt of the median interval, i is the width 
of the class interval, and c is the proportion of observations m 
the median interval that are less than Mi It is to be noted 
that the eases are assumed to be distributed uniformly through 
the interval 

In these terms the formula for average deviation can be written 
as follows 




(9) 


in which F<i is the frequency of the median interval Ci is the 
amount of eoirection associated with observations above and 


^tFaObstrved 

^ et I fjx/i 

M4 ! 

MiAv&ie 

\ efoterrat 

Fzo 74 — Iliuatration of distnbut on of cm«s in and above and below the median 
interval 

below the median interval and Ct is the amount of correction 
associated with the median interval itself 
To demonstrate the truth of this equation, consider the 
diagram of the median interval shown m Fig 74 Since devia- 
tions from the mid value of the median mterval are — c)i 
too small for observations above the median interval and (J — c)t 
too large for those below the median interval, it follows that 

= »(i - c)(2c - I)Fo = tn(2c - 2c* - i) 

The area m the median mterval below Mi is cFt and its mean 
deviation from Mi is ci/2 Similarly, (1 — c)Fq lies above Mi 
with a mean deviation of (1 — c)»/2 Hence 




C.-cf’.(5) + (l-c)i(l-c)F. 



FREQUEXCY-DISTRIBUTIOX AXALYSIS ■ 223 

From Eqs. (10) and (11), the combined corrections are found tobe 

Cl + Co = fFo(2c - 2c- - i) + iFo{c- - c + ¥) 

= iF o(c — C-) = iFoc{l — c) (12) 

a result that verified Eq. (9). Equation (9) is probably the most 
convenient form available for computing the mean deviation 
according to the histogram assumption. 

Calculation of the average de\dation by the use of Eq. (9) 
is illustrated by Table 17 and the ensuing analysis. 


T.vble 17. — Frequkxcy Distribdtiox of the Heights of 300 Phixcetox 
Fheshmex, Cl.\ss op 1943 
(In inches. Class interval 1 in.; 


,v 

/’ 

j 1 

’ d 

1 - 

• 

62- 

1 

1 

-8 

-8 

63- 

2 

-7 

-14 

04- 

1 

-6 

-6 

Go- 

4 

—5 

-20 

6&- 

12 

-4 

-48 

67- 

31 

-3 

-93 

6S- 

31 

-2 

-62 

69- 

47 

-1 

-47 

70- 

48 

0 

0 

71- 

42 

1 

42 

72- 

35 

2 

70 

73- 

21 

3 

63 

74- 

14 

4 

56 

75- 

8 

5 

40 

76- 

2 

6 

12 

77- 

1 

7 i 

7 


300 

i 

i 

-298 

+290 

S (without regard 
to .sign) = 588 


' When the median and the quartiles were calculated, it was 
assumed that the frequencies ivere evenlj' distributed in the class 
inteiwals. This assumption's continued while calculating the 
average deviation about the median. As shown in Table 17, 
the sum of the deviations about the arbitraiy origin without 
regard to sign is 588. That is, 









224 


ASALYiili) Ot HU-QUh\C\ HIBUTlOYt, 



where rfi — Yi — A 

Jt « -Y, - I 

= -Y« - A 

The sum desired is the bum without regard to sign of dev lations 
from the median 1 hat is. 



where xj = A'l ~ Mi 

xi = \i - Ml 


xl « A. = Ml 

NoTb * has been uacd U> 8>mbo]i20 tho deviations from tho nnthmcUo 
mean z is used to a>mholizc deviations from the tneduin 

Accordingly the above sum, 588 which for the illustration 
chosen IS r[F(d/i)j can be adjusted by a calculated correction 
that will change the sum to l|F(xV0| This correction is 
obtained by using Eq (9) 

From Table 17 aad tho analysis on pages 221 to 223 it is to bo 
noted that Fo = 48 the frequency of the interval containing 
the median, t « 1, and c = 0 14 <;mco the median ib 70 44, tho 
lower limit of the interval containing the median is 70, ind c is 
the proportion of observations m the median interval that are 
Ils 3 than the mediae Accordingly, the average deviation may 
bo calculated b> using Eq (9) as follows 

,,, +c(.-o)f,] 

\ U — 

= 1(588 + 0 44(0 50) 48) 

300 

= 588 + 11 83 
300 

= 2 00jn 

The Semiquariilc Range The Mjmiquarlile range, one-half 
of the differonce between the third quartilc and the first quartilc, 



FREQUENCV-DISTRIJiUTlON ANALYSIS 


225 


is another statistic that measures variability. Its formula is 

n — ^8 ~ 

^ 2 


For the frequency distribution in Table 15, the semiquartile 
range is calculated as follows: 

^ _ 72.17143 - 68.77419 , . 

y = s- = 1.69862 or 1.70 m. 


Measures of Skewness. From statistics measuring variation 
and central tendencies, important measures of skewness are 
obtained. It has been noted that X — Mo is a measure of 
skewness. In the frequency distribution of 300 Princeton 
freshmen heights, 

Z - Mo = 70.473333 - 70.36584 = 0.10749 or 0.11 in. 

The position of the first and third quartiles in relation to the 
median is a very convenient statistic measuring skewness, 
namely, 

Qs — Mi — (Mi — Qi) or Qi + Qs — 2Mi 

For the frequency distribution of heights of 300 Princeton 
freshmen this statistic is 


68.7742 + 72.1714 - 2(70.4375) = 0.07 in. 


COEFFICIENTS OF VARIABILITY 

The various aggregative measures of variability may con- 
veniently be expressed as relatives or coefficients, as explained 
in the preceding chapter; indeed, they must be so expressed if 
comparisons are to be made with other frequency distributioirs 
having different types of units. The aggr’egative measures of 
variability are converted into relatives or coefficients by dividing 
the former by the mean, the median, or the average of the two 
quartiles. For the present problem, the relative measures of 
variability that would be usefirl in comparing this frequency 
distribirtiqn with other frequency distributions, are as follows: 



22(i HthQirh\Cl DI‘ilIUIiUHO'f^> 

F, = — = 3 53 per cent 
X 

Ta o = = 1 38 per cent 

V<, = 2?^i|-2 4lpcrccnt 

I he formula for the i3 really the seimquartile range divided 
by the average of the t\Ao quarlilcs, but the 2 s cancel out, leaving 
merely the difference between the two quartiles m the numerator 
and their sum m the denominator 


COEFFICIENTS OF SKEWNESS 

Statistics mcasunng 8kc^v^css are likewise more significant 
for comparatno purposes when expressed as coefficients The 
vanous coefficients of skewnc*® for the frequency distribution 
in Tabic 10 are as follows 
Based on mathematical statustics 


sk 


■ ^ or - n 17 per cent 

Vfi Wi + 3) 


— 0 01 12 or — 1 12 per cent 


“ “ 2(5/3, - b/3i - 9) ’ 
hoTiii Thu 18 given the negative sign because the third raomeut w 
ncgativ e 

Based on other statistics (using Mo =« 70 488) 

I A — Mo —0 015 m nrvrtf r\r 

sk = - ei - . - o-i = —0 000 or —0 0 per cent 

ir 2 487 in 


(If the 80H.alle 1 mathematical mode i e Mo » 70 366 is use 1 tl 
( oefhcient of ske>> nc&s bj the same formula would be +4 32 per r nt ) 

Usint, tlie median and the two quartiles to measure skew no 
the following result is obtained 


. + Qi - 2aii +0 0706 m 


» +oaaO, or +4 It) 

per cent 


The difficult} of locatmg the mode eten when quite a large 
‘‘ample is taken is illustrated by the frequency distribution 
anal) zed in this chapter In this illustration every mathematical 



FREQUENCY-DISTRIBUTION ANALYSIS 


227 


indication, is that the mode is larger than the mean, but the non- 
mathematically calculated mode (the interpolated mode) is 
smaller than the mean. 


GRAPHIC INTERPRETATION OF STATISTICS OF VARIABILITY 

AND SKEWNESS 

Figure 75 shows on a scale the relative location of the median, 
the two quartiles, and the upper and lower limits found by taking 
Mi ± A.D.Mi, namely, 70.44 + 2.00. The figure illustrates the 
fact that, when there is skewness, the location of the quartiles 
mth reference to the median reflects the presence of ske^vness. 
If, therefore, the quartiles are used as measures of deviation, 
they reflect the fact that, in skewed distributions, the deviation 


■A-O-w- 

Ml 


Mi 


+A.D. 

Mi 



Fig. 75. — Illustration of significance of average deviation and two quartiles us 

measures of dispersion. 

is skewed in one direction or the other. If the average deviation 
is used as a measure of deviation or variability, the presence of 
skewness will not be noted in the results. Whenever the 
distribution is skewed to any e.xtent, the quartiles are unequal 
distances from the median, as may be noticed in Fig. 75. As the 
figure also illustrates, the average deviation is conceived as an 
equal distance on each side of the median. 

Figure 76 shows on a scale the relative location of the median 
and average deviation and the location of the mean and the 
standard deviation by depicting the upper and lower limits of 
Mi ± A.D.Mi (as in Fig. 75) and, in addition, the upper and 
lower limits of X ± o-. As in the case of the average deviation, 
so also in the case of the standard deviation, the measure of 
variability is conceived as an equal distance above and below 
the mean — that is, an equal distance from the mean on the 
.c-axis in both the positive and the negative direction.s in Figs. 



228 


AN ILYSIS OF FREQUBSCl DISTRIBUTIONS 


75 and 76 If the distnbution is skewed to a marked extent, it 
should be evident that caie must be exercised m interpretmg the 
significance of the average deviation or the standard deviation 
From Fig 75 it is noted that the first quartile in the negatne 
direction and the third quartile m the positive direction are less 
distant from the median than ±ADm. Bj definition, the 
limits of the range between the first aiui third quartiles mcludc 
exactly 50 per cent of the cases For a normal distribution’ the 
distance between the upper and lower limits defined by 



Fm 76 — -lUiutraCion of the standard deviation and average dsMation as uieas* 
urea of variabdily 


It Will be noted irom Fig 76 that the hmits A' ± it are farther, 
lespcctivel}, m the positive and ncgati\e directions from the 
mean than aie the limits Mi ± A D h from the median The 
standard deviation is always laiger than the a\erage deviation, 
in fact, an approximate check* on the accuracy of calculation may 
be used as follows A D = 0 Str In the frequency distnbution 
illustrated, this check works fairly well, for 0 8(2 49) =» I 97 and 
the calculated A D «, = 2 00 For a normal distribution the 
distance between the upper and lower limits defined hy X ± it 
includes approximately two*thirds (68 per cent) of the cases * 
tREQUENCY DISTRIBUTIONS WITH UNEQUAL CLASS INTERVALS 
As remarked earlier in this chapter, the size of the class interval 
should ordmarily be uniform throughout a given frequency 
^ Sec Chap XI for description of a normal distribution 

* For more precise discussion and explanation, sec Chaps XI and XII 

* For distributions that depart widely from the normal form, this efic-ck 
may not be satisfactory 


FREQUENCY-DISTRIBUTION ANALYSIS 


229 


distribution; but in some cases, usually because there is a large 
concentration of cases at one or the other extreme of the range, 
it is considered necessary to use different-sized class intervals for 
parts of the frequency distribution in order to give a proper 
picture of variability. Table 18 illustrates such an instance. 
Of 150 cases distributed over the range 0-51, 106 cases fell Avithin 
the limits 0-10. Obviously, a small number of class intei-vals 
of uniform size would give 'a wholly erroneous notion of the 
variation. Occasionall}'^, data at its pi’imary source will be 
published in a manner similar to that of Table 18, and the 
statistician has no choice but to utilize the material in frequency 
distributions that have unequal-sized class inteiwals. This is 
particularly true of statistics of wages and income and statistics 
of hours of labor. 


Table 18. — DuatHcS Dub to Auto.mobilk .Vccidbnt.s i.\ 150 Cities,* 
FinsT 20 WuEK.s OF 1910 


Number of deaths due to automobile 
accidents 

.Y 

Number of cities 
whose automobile 
accident fatalities 
were as specified 

F 

Calculations of 
deviations ftom nii 
arbitrary origin 
(A = 16) 

t 

Intcrval'i 

Mid- values 

0- 

0.5 

11 

— 1 4.5 

1- 

2.0 

23 

-1.30 

3- 

4 0 

34 

-1.10 

5- 

7.5 

38 

-0.75 

10- 

15 0 

24 

0 

20- 

25.0 

12 

1.00 

30- 

35.0 

4 

2.00 

40-51 

45.5 

4 

3.05 



. 150 



* New York, Los Angolcs, Chicago, and Detroit arc excluded from these statistics. 
United States Buicau of tlic United States Census, Weekly Accident Bulletin, May 24, 1940, 
pp. 1-4. 
ft === 10. 

If the mid-value of the class interval 10- is taken as the 
arbitrary origin, that is, A = 15, and the “class interval” or 
abscissa scale unit i is taken equal to 10 (since that size interval 
predominates), the deviations of class intervals in that part of the 
frequency distiibution Avhere class intervals are equal are readily 
determined. Where the class inteiwals are unequal, simple sub- 














230 iLibIS Ot tiiEQlh\Cl DIbTKJBUTlO\b 

traction of mid \alues and the division of the ansi\er by the 
'icale unit gjvea the results m the last column of Table 18 To 
iHustrate the proeeaa, there is a difference of 10 5 between nud- 
1 alue 35 aud imd-% alue 45 5, a quantitj 1 05 times the scale 
umt Accordmgly, the de\'iattoii ad\ ances from 2 0 to 3 05 
>cale units In the lower reaches of the range there js a difference 
of 7 5 between mid-\alue 15 and nud-value 7 5, or I a scale umt, 
consequentlj , the step-deviation change is from 0 to —0 75 
From mid v alue 7 5 to nud-value 4, the deviation recedes 0 35 
a scale unit to —1 10 From mid-value 4 0 to mid-value 1 5, a 
distance of x ^.n mterval, the scalc-umt devnation changes from 
— 1 10 to —1 35 Fmalli, from mid-value 1 5 to mid-value 0 5, 
a distance of tV a scale umt, the scale-umt devuation recedes 
from —1 35 to —1 45 

From this point on, the analj-sis of the frequencj distnbution 
IS the same as it would be were umform class intervals used, 
although obviousl) the tmevcn numbers add somewhat to the 
burden of filling m the work sheet according to the plan shown in 
Table 10 Once the work sheet has been completed, however, 
the fact that the cla»4 intervaU arc not uniform ceases to be a 
consideration m the subbequent computations, the summation 
figures can be apphed in the formulas m precisely the same 
manner as if the class intervals, were uniform 

ACCtTRACY IN THE CALCULATION OF STATISTICS 

Ordinarv common sen^e would dictate that all lecordmg of 
figures needs to be carefully checked, since there is always a 
chance of making a mistake in copy mg Such mistakes are not 
statistical errors to be disregarded under the "theory of error*," 
which IS explained m Chap XI They cannot be disregarded, 
and eveiy effort should be made to prevent their occurrence 
The same applies to all calculatioas made, but frequently 
'•hort-cut or cross checks can be devised for these ^Vhile 
accuracy is e&sential, a «punous accuracy may be introduced mfo 
final answers For example, lu moot cases final figures repre- 
sentmg samples should be presented m round numbers, inclining 
only the sigmficant figures m the arithmetical answers obtamed * 

Care must be taken, howev er, m cases where errors are likely to 
iccumulate through successive steps of calculation It ma\ be 

* The meaning of significant 13 explained oti p 213 note 



FRmUEyCY-r>l,'iTltlBUTlOX ANALYSIS 


231 


necestiury to retain the figures in a calculated result for a number 
of places beyond the significant figures if that calculated result 
is being used in the process of calculating other statistics. In 
some statistical problems it Is necessarj' to add a constant 
sucees-sively perhaps fifty or even hvmdreds of tunes, or, similarly, 
to multipl}" bj' a constant .succassively a large number of times. 
In such instances the constant .should be written to several more 
place.s than will be used in the final answers in order to avoid 
an error in .significant figures at the end 6f the proce-ss. This Is a 
pureh" mathematical problem; in ever}' case, the standard of 
accuracy required, or the number of significant figure.s, having 
been decided upon, a simple arithmetical calculation will show to 
how many places the intermediary calculations must be carried. 
The final re.sult.s are then rounded off to the number of significant 
figurc.s. 

In ro'unding numbers the rule Is that a remainder le.ss than 
half a unit Is disregarded, wliile half or more than half is counted 
as an additional unit. E.vactly half may be changed to the 
nearest even number — thus 171.5 would be 174 but 175.5 would 
he 176. 



PART III 


The j^ormal Frequency Curve 

CHAPTER vni 
PROBABILITY 

Up to thus point, the <Jiscui«ion has primarily been concerned 
iMth “dcscnptiio statislici. ” Attention has centered upon 
methods of summanzing and dcscnbuig statistical sanation 
Occa‘iionalh , theory has been employed to explain certain 
methods or to indicate whj one method is to be preferred to 
another, but, m general, emphasis has been upon the facts as 
such, rather than upon an} theoretical explanation of onnfcrcnco 
to be made from these facts 

In contrast, the next four chapters «ill be pnmarily concerned 
w ith a particular body of theoretical statistics, namelj , the theory 
of the normal frequencj curse The question now to be con- 
••idercd is not “what” is. the character of a given frequency du- 
tnbution, but “whj ” The discussion will be abstract and 
general and wall not pertam to actual concrete data, except by 
way of illustration 

Before this theoretical analysis can be undertaken, howeicr, 
certain mathematical tools must be acquired and certam funda- 
mental concepts clarified That is the purpose of this and tiie 
next chapters 

PERMUTATIONS AND COMBINATIONS 

Fennutalions Defined and lUustrat&l A “permutation” is 
an arrangement The word “man/’ for example, is a specul 
irrangcmcnt of the three letters »i, n, and « Other possible 
arrangements of the&c tliree letters are nina, nina, iiam, anm, 
and amn All these arrangements are jiermutations 
232 



PJiUBABlUTY 


233 


In general, it there are X different things, it is possible to form 
XI different permutations^ Consider again the three letters 
III, a, and «. In making various arrangements of these, it is 
possible to pick the first letter in three different -ways. The 
first letter having been picked, there arc then left two different 
wat'S for the selection of the second letter. Finally, the first 
two letters having been selected, there remains one, and only one, 
way for the selection of the last letter. Now, each one of the two 
ways that are open for the selection of the second letter can be 
combined uith each of the three ways that are open for the 
.‘•election of the first letter, so that there are 3X2 different ways 
of picking the first and second letters. Since there is only one 
way left in every case for the .selection of the last letter, there are 
therefore 3 X 2 X 1 = (5 different wa 3 -s of picking all the thret* 
letter's. Thus, the number of different permutations of three 
things is 3! = 6. If there had been 10 different letter's, the 
rrumber of different permutations of these would have been 
10X9X8X7X0X5X4X3X2X1 = 10! = 3,628,800. 

Suppose, now, that among 10 different things 3 are to he 
selected for some particular purpo.se, the exact nature of the 
purpose being hnmaterial for the analysis. The cprestion i.s; 
In how many different wa.vs may a subgroup of 3 be selected 
from the total of 10; hr other words, what is the number of differ- 
ent permutatiorrs that can be made of 10 things taken 3 at. a 
time? This questiorr mat' be answered as follows: It is possible 
to select the first of the subgroup of 3 in 10 different ways, the 
second in 9 different wat'S, and the third in S duierent ways. 
There are thus altogether 10 X 9 X S different ways in which 
the 3 things may be selected from the toud of 10. Accordingly, 
the number of different permutations of 10 things taken 3 at a 
time is 10 X 9 X 8 = 720. In general, the mrmber of different, 
permutation.s of X things taken r at a time is 

P;'” = .Y(A' — 1) • • • to r factors 

that i.s, 

P^- = X{N - 1)(.Y - 2) ■ • iX - r 1) (1) 

Conibinaiions Defined and Illustraied. A “combination” is 
not the same thing as a permutation. A group of 3 letter's c 



234 


THE NORMAL hREQUbNCY CURVh 


stitutes a combination of these 3 letters, but as has just been 
seen, this combination cm bo ananged in 3! diffeient %\ajs In 
other words, it is possible to have 3’ permutations of a single 
combination of 3 things In general it is possible to have N\ 
permutations of a smglo combination of things 

Although a group of N things forms but a single combination, 
subgroups may bo picked m such a way as to constitute different 
combinations Suppose for example, that the board of directors 
of a given corporation consists of 10 men and the chairman 
The chairman wishes to pick a committee of 3 men In how 
many different ways can such a committee be constituted, the 
chairman himself being excluded^ This is a question of how 
many different combinations of 3 men may be taken from a 
group of 10 men It will be noted that the older of selection 
IS immaterial, for it is only the constituency of any committee 
that differentiates it from other possible committees 
The answer to this question is obtained as follows Ixit C}** 
represent the number of combioabons to be calculated, iiz. the 
number of different combmatioos of 10 things taken 3 at a time 
Each one of these combinations, it mil be recalled, can bo 
arranged m 3’ different wajs, t e , there are 3' different ways m 
which a given committee can be selected Accordingly, the 
total number of ways m which a committee of 3, t e , just any 
committee and not a particular committee, can be chosen is 
equal to C j® X 3 ' But the total number of different ways in 
which a committee of 3 can be picked from a group of 10 is the 
number of permutations of 10 things taken 3 at a time, which is 
equal to 10 X 9 X 8 Therefore Cj® 3’ = 10 X 9 X 8, and 
Cj* = — In general the number of different com- 
binations of N tilings taken r at a lime is 

Q = - l)(Af - 2 ) (A - r + 1) ^2) 


or if numerator and denommatoi are both multiplied by — r) ' 




iV* 

r»(Af - r)' 


(3) 


The Binomial Expansum. V use that is made of combina 
tonal theon m elementary algebra is to find a formula for the 



PROBAblLlTY 


235 


expansion of the binomial (x + y)^\ It will be recalled that 
{x + yY = (x + y){x + y) is found bj’ multipl3dng each term 
of the first factor by each term of the second factor and adding 
these partial products. Thus 

(x + yY = xy + y- = X- + 2xy + y- 

A higher powered binomial can be evaluated bj' mere repetition 
of this process. Thus 

(x + yY = (x + y)(x + y)(x + y) = (x- + 2xy + y-)(x + y) 

= x^ + 2x2y + xy- + x-y + 2xy- + y® = x* + 3x^ + 3xy- + y’ 

It will be noted that the result in each case consists of a series of 
terms in diminishing powers of x (or rising powers of y), and this 
is generally true no matter what the power of the binomial.- 
It will also be noted that the number of times a given term 
occurs (f.e., its coefficient in the expansion) depends on the 
number of wa3’'s the x’s (or y’s) that make up that term can be 
selected from the different factors. Thus in the case of (x + y)® 
the term composed of three x’s, that is, x®, can be formed in onl}'^ 
one waj’’, namely, bj’’ taking an x from each of the three factors. 
The term x^y, however, which contains two x’s, can be formed 
in three waj^s. This is because the number of different com- 
binations that can be made of three x’s taken two at a time is 
3 • 2 • 1 

j j == 3. Similarl3'-, the coefficient of xy- is the number of 

different combinations of three x’s taken one at a time, which is 
3 • 2 • 1 

= 3. Accordingl3', the expansion of (x -f y)® might 

be written (x -1- y)® = C3X® d- Cix-y -j- Clxy- + Coy®, where C3 
means the number of combinations of three things taken three 
at a time, Cl equals the number of combinations of three things 
taken two at a time, etc.,^ the evaluation of these quantities to be 
determined by Eq. (3). If consideration were given to the 
power of y instead of x, this new method of writing the expansion 
of (x -f yY would become 

(x -j- y)® = CflX® -f Cjx®y -f Clxy- d- C^y® 

* Note that 0! is taken by convention to be 1, so that CJ = 1. 



230 1UI \Olt\Hh hlttQUkMl tUlUt 

Ju grnera], 

(i + y)' « 4- 

+ + Ci'yv (4) 

or, on UMiig the vcoond method of cvpreasion, 

(r + y)^ = O' + < 

+ CLtxy'-‘ + C^yv (4a) 

'1 hud 

(z 4- y)* = CU* 4- OV 4- OV* 4* Cjj-y* 4- Cjy‘ 

= 7‘ 4- 4z*y + OxV + 4jy* 4- y* 

and 

(X 4- y)* = Ctx* 4- O'y 4- OV* + O'y* 4- Cixy* 4- Cty* 

« z* 4- 5z*y 4- lOxV 4- lOj'y* + 5xy* 4* y‘ 

It IS jn this \\a> that the combmalonal fonnulas enter into the 
binomial expansion I iter it will be seen that a certain fre- 
quency distnbution is called a ‘‘binomial distnbution” because 
its relatito frequencies arc computed in the samo way as the 
coeiBcients of the terms of a binomial evpansion 
MATHEMATICAL PROBABILITY 
Ihe concept of probability his been the subject of much 
debate among phlto^ophcrs, nuthcmaticmns, and statisticians 
To enter mto this debate, however, would be beyond the scope 
of this book 1 Although the concept of probability presented 
below appears to be the most suitable for an elementary text 
and la apparently the one most m favor among statistieians, it 
must not be thought that other approaches ire ncccssanb 
invalid or even possibU Ic^ fruitful * 

* A brief rcMCW of the classiealthcorj and Iho frcqueiicj thcorj of It \oti 
Misca w preaciitwl in the 4pj>cndj\ pp 242-2 j1 

* Ihe concept of probability presented in this book u patterned after that 
prc'iented b> J Nejnian jjj bu Lreturr* and Conjtreftcet on Matfet olical 
Staliitics (Graduate School of the United States Depurtmeut of Agriculture 
Waslunglon, 1937) Hisvjcwb Ur \e>niaii believes, aresharedby I S 
Pearson and other workers attacbc4 to the Department of Statistics at 
University College, l>ondon ’ lie also refers to II Cramer, Jiandom 
\ anabki oiui Protabih/y Dirlnbulions (Cambridge, 1937) , Maurice i rechet, 
Kecherche* theoriijvei modrnuM *ur la ikwte dc$ probabiltlU (Gautlnera- 
\illar8, Pans 1937), V KolmogorofT, Gnindbtgnj^e dcr H ahrtc} etnlieh- 
keiUrrrbniii (Julius Sponger, llerlui 1033) and D J Struik, ' On the 



PROBABILITY 


237 


Dcjiivitioti. A. discussion of probability can best begin with n 
finite set of objects. Suppose that, in a given set of t objects, in 
possess a given property and n do not possess this propert 3 ^ 
Then the probability of an object of this set having the given 
property is m/t or the relative frequency of these objects in the 
set. The word "object” as used in this definition is to be 
interpreted broadly. Besides objects proper, it may be taken 
to include events that have the property of occurring or even 
propositions that have the property of being true. 

To illustrate the above definition of probability, consider an 
ordinary deck of 52 playing cards. This will have 26 red cards 
and 26 black cards; hence the probability of a red card in this 
deck is ft = The deck also contains 13 cards of each suit, 
so that the probability of a heart, saj"^, is ^ This is also 

the probability of a diamond, or a spade, or a club. 

Description of Fundamental Probability Set. It should be 
especially noted that in defining a probability the set of objects 
to which it pertains must be precisety designated and the prop- 
erty of an object to which the probability refers must be care- 
fully distinguished. For example, the probability of an ace in 
a pinochle deck‘ is A = i and not ■A' = iV, as it is in an ordinary 
deck. Furthermore, for the same set of cards, the probability 
of a card of a given color is not the same as the probability of a 
card of a given suit or of a card of a given value. What is more 
important is that each of these properties and hence their prob- 
abilities pertain to a different classification of the objects of the 
set. As will be seen later, it is possible to add probabilities per- 
taining to the same classification of the objects of a given set, 
but not probabilities pertaining to different classifications, even 
though the set of objects is the same. A set of objects classified 
in a given way is called a “fundamental probability set.” In 
all calculations it is very important to define carefully the funda- 
mental probability set that is involved. 

In this connection it should be noted that the “probability of 
a heart in an ordinary deck of cards” is not necessarily the same 
thing as the “probability of drawing a heart from the deck.” 

Foundations of the Theorj" of Probabilities,” Philosophy of Science (1934), 
Vol. 1, pp. 50-70. 

* A pinochle deck consists of 2 aces, 2 kings, 2 queens, 2 jacks, 2 tens, and 
2 nines of each suit. There are no cards of lower value. 



238 THE NORMAL FREQUENCY CURVE 

l-oi the former, the fundamental piobability set is precisely 
designated, it is simply the given deck of cards classified accord- 
ing to suit The total number m the deck may be readily 
counted, the hearts maj be easily separated from the others, 
and their relative frequency, te, their probabihty, may bo 
directly computed But what is the fundamental probability 
set to which the “probability of drawing a heart from the deck” 
pertams*^ To this there are several answers 
Suppose that 100 drawings are made from the given deck, 
the card drawn each time bemg replaced in the deck and the 
whole well shuffled before the next drawing Let the number 
of hearts so drawn be 20 Hero the fundamental probabihty 
set IS the set of 100 drawings classified according to suit, and 
the probabihty of a heart m this set is — i In this case 
also, the total number of objects can be counted and the num 
ber havmg the given property can bo readily ascertained 
The “probability of drawing a heart from the deck” may, 
however, pertain to a set of 100 diawmgs to be made m the 
future Heie the total number of “objects” in the set is given, 
but theie is no way of ascertaining how many of these drawings 
will yield hearts In this case, the “piobabihtv of drawmg a 
lieart from the deck” is simply uiikiioivn 
Finally, the “probability of drawing a heart from the deck” 
may pertain to a set of hypothetical drawings, not actual draw- 
ings If, it may be argued, 100 drawings should be made from 
the deck m the piescribed mannei and if 30 of these should be 
hearts then the piobability of a heart in this assumed set would 
be -nnr The “probability of drawing a heart from the deck” 
refers in this case to a hypothetical set 
Infinite Probability Sets Frequently, probability theory is 
concerned with an infinite sot of objects These are usually 
hypothetical sets but may m some cases be real sets, such as the 
infinity of pomts on a line Without going mto mathematical 
refinements it may be said that the probability of an object 
of a given propeity m an infinite set is the peicentage of such 
objects in the set For the percentage of a particular kind of 
object in an infinite set may be finite even if the numbei of 
objects of the given pioperty and the total numbci of objects 
are both infinite For example, if a com is tossed iiidcfimtcly, 
both the total numbei of tossing and the number of tossings 



PROBABILITY 


2:^9 


yielding heads may be increased without limit. Nevertheless, 
the ratio of the number of heads to the total number of tossings 
will stay vdthin finite limits no matter how many tossings are 
made. For an infinite set, therefore, as well as for a finite set, 
the probabihty of an object having a particular property is the 
relative frequencj’- of such objects in the given set.‘ 

PROBABILITY AND THE RELATIVE FREQUENCY 
OF ACTUAL EVENTS 

In concluding this chapter a few words should be said about 
the relationship between mathematical probabihty and the 
relative frequency of actual events. As defined above, prob- 
, ability is a constant characterizing a given set of objects; it Is 
merely a mathematical abstraction. If the theory of prob- 
ability is to be of any practical use, however, it must be tied to 
the relative frequency of actual events. It must help, in other 
words, in making predictions about real life. 

The Law of Large Numbers. The link that ties mathematical 
probability to the relative frequency of real events is actual 
experience with mass phenomena. This experience has been 
called the “law of large numbers,” which says that, when a large 
number of random events is involved, it is usually possible to 
predict, mth reasonable accuracy, the relative frequency of 
occun-ence of a particular event by calculating a certain mathe- 
matical probability. To illustrate, consider once again an 
ordinary deck of pla 3 dng cards. Mathematically, this can be 
looked upon as a set of 52 objects for which the probability of a 
heart is If = i. Let a large number of drawings, say 1,000, be 
made from this deck, the card drawn each time being replaced 
and the deck well shuffled before the ne.xt dramng. As already 
pointed out, no exact statement about the number of hearts 
drawn can be made in advance of the drawings. Experience 
shows, however, that in random drawings of this kind the relative 
frequency of hearts drawn approximates fairly well the mathe- 
matical probability of a heart in the deck. Hence, in the given 
instance it may be predicted that of the 1,000 random drawings 
something close to 250 will be hearts. 

1 For a more refined definition of probability, see Neyman, op. ctl., pp. 
10 - 11 . 



240 lur NORM il IllEQUENCY CURVE 

The foicgoing ib a ^cij simple ilhistr ition of the law of large 
juimberN Tin law apirean to be tvpjally viiid howeier for 
moie complicated calculations of pioliabilitj I or example 
suppose theie aie two decks of caids one an ordinary deck and 
the other a pinocWe deck and suppose that all possible combuia 
lions of two caids are made by combining one card from the 
oidmary deck with one card from the pinochle deck Since the 
fii’st card can be picked m 52 ways and the second in 48 ways 
there wall be 52 X 48 = 2 496 such combinations Of the&c 
2 496 combinations 4 X 8 = 32 will be pairs of aces hence in 
this set of combinations 32/2 496 = ^ is the probability of a 
pair of aces ^ Now let a very laigc number of drawings be made 
from each deck of caids tlie card drawn each time being leplaced 
and the deck well shuffled btfoic the next diawing hiirthcr 
more lot the first card drawn ftom the oidmaiy deck be paired 
with the first card drawn fiom the pinochle deck the second 
caid from the oidmary deck with the second card from the 
pmochk deck etc Then if the number of random draw mgs 
is very large cxpcncncc shows (hat the pan-s of aces actually 
occurring m this laigc set of drawings will be close to -tV times 
the total number of drawings Again the relative frequency of 
actual events can be approximately predicted by the computation 
of a mathematical probabihty In fact if random mass phe- 
nomena are involved the whole of the calculus of piobability can 
be employed in the prediction of relative frequencies with satis 
factory accuracy 

Empirically Determined Probabihttes It might be pointed 
out in passing that m many mstances the ongmal set of objects 
is not completely known and the probability of a given property 
of the set must be determined empirically For example the 
total number of deaths in the United States of white males age 
fifty is not completely known Indeed, so far as we know 
deaths of men age fifty will contmue to occur indefinitely Thus 
of tko total uumboi of mtav wlio Vueve icsiobod and will roac.li tbo 
age of fifty, the number who have died or will die during their 
fiftieth year is not piecisely known On the basis of the law of 
large numbers however it seems safe to assume that the many 
vital statistics that have been accumulated give a very close 
approximation to the true probability of death at age fifty That 

> Cf Chap X 



PROBABILITY 


241 


Ihis iis&uiuplioii is jiistificd is RgBiiii vcrifiGcl iictucil experience 
Thus, if the empirically determined probability of a man dying 
at age fifty and the empirically determined probability of his Vife 
dying at age fifty are used to calculate^ the probability of both 
a man and his wife dying at the age of fifty, experience with large 
masses of data shows that the relative frequency of such pairs 
of deaths at age fifty does actually approximate the calculated 
probability. The calculus of probability can thus be used by 
life-insurance companies with general success. Similar results 
have been found true of other empirical^ determined probabili- 
ties. The law of large numbers thus appears to be universally 
valid. - 

“Randomness.” It rvill be noted that the law of large numbers 
applies onlj”- to mass phenomena that are “random.’” This is 
very important. If it happened, for example, that, in drawing 
pairs of cards from an ordinary deck and a pinochle deck, some 
method of selection were used that caused aces to appear in some 
cyclical order, say an ace on every tenth draw from the ordinary 
deck and on every fifth draw from the pinochle deck, then the 
relative number of pairs of aces occurx’ing would not equal the 
computed mathematical probability. For, in this case, pairs 
of aces would occur on ever}’’ tenth draw, and the probabilit}’- 
of a pair of aces in the infinite set of drawings Avould be xV ^-nd 
not xV) ^ computed above. 

“ E,andonme.ss ” cannot be exactly defined. Fundamental^, 
it is an intuitive concept. General notions suggest that to be 
random the occurrence of an event must be related in no way to 
its property; e.g., the drawing of an ace must be unrelated to its 
being an ace. Nor must a random series of events show anj’^ 
relationship between the members of the series. In other-words, 
events must occur in complete disorder; they must be unpre- 
dictable by any formula. But, after all, these are negative 

1 See Chap. X. 

^ The association between mathematical probability and the relative 
frequency of real events is not essentially different from the association 
between mathematical models in other sciences and happenings in the real 
world. In physics, for example, the closeness of the association is good 
enough to enable mathematical formulas to bo used in the construction of 
bridges, automobiles, and the like. In other words, the justification of the 
theory is that it works. 



242 nih \ORM \L FRmUt\C\ CURVE 

catena 'ilic pobituc content of raDdomnc&s mubt be left 
undefined * 

VPPfcNDIX 

A REVIEW OF THREE IMPORTANT CONCEPTS OF PROBABILITY 
ta pointed out m the roam bod} of this chapUr, various concipts of 
probability are admibsiblp Mtogethcr there appear to be three pnniipal 
concepts that ha\c contended for acceptance b> scientists and philobophera. 
These may be dcacnbcd aa the ' classical concept,'-’ (he frequency concept, ' 
and the ' mtuitivc-axiomatic approach ’ to probability It is this last 
that IS used la this book bince it u an outgronth o[ the conflict between 
the other tuo lines of thought, they mil be discussed first 

Classical Concept of Probability lltatonmt Background Although 
(.ommerrial insurance sras practiced by (he Babylonians unduasuctlknoiMi 
to the Greeks and Ilomana (he dovclopiDciit of a theory of probabiht}, 
such os that on Mhich modem insurance practice is based, dates back only 
to the Boientccnth ccntiirj lurthcnnorc, it was not in the held ofbiismeM 
that the seeds of this probabitdy theory ivcrc sown but in (be gambbng 
rooms of the Trench gentry In 1654 Autome Gombaud, ehcialicr de 
M^rt, a Trench gentleman with an interest m mathematms called upon (he 
French mathematician Pascal for the solution of i particular gambling 
ploblcm The ensuing malhcmalical speculation marked tho beginning 
of the mtcstigalion of games of chance Subsequent!} there appeared 
lanous works by IIu}gciis (lGo7) Jacques Bernoulli (1713), De Moivra 
(171S) and Bayes (1764) mo&t of which were conccrnetl with the applies 
tion of the tbeor} of permutations and combinations to the calculation of 
probabilities associated with lanous dice and card games 

Meanwhile, French and IcnglisU cxpcnmciitaliats, nmtbcuiuticul plijsi 
cista, and astronomers were concerning (heiR<iehcs with errors of measure- 
ments Sunpion (17o7) examined tbc implicutious of taking tbo mean of 
a set of astronomical me isurcminls '\h the licst Lstiinatc of the true value, 
and laigrangc (1770) published a memoir dealing with the “probable 
error" of the mean * Other naiucs associated with the carl} dciclopmcnt 
of tho tbeor} of errors are Boscovich, Lambert, Lulcr, Darnel Bernoulli, 
undLegcudre ’ Tbo dei clopment of such concepts as ‘incersoprobability” 
and probabiUt} of ‘ causes ' also led at this time to growing philosophical 
speculations on the theory of probabiht} Turtbermore, the collection of 
mortality statistics led to tho computation of mortality tables and the 
deielopment of actuarial science 

Ml these investigations — tbo nnalyBis of gambling games the formulation 
of a tbeor} of errore and philosophical speculation — reached their culmma 
tion 111 the great work of Laplace, Tllorie anal jltqiie ilet prvhabiliUt (1812) 

^ See Smith and Diuu an. Sum; fii ^AtiUtWics, pp 155-102 for u ducusaion 
of 1 arious methods cmplovcd to M't random samples 

’C/Lbvy U , and L Ucrni Afcmcn/s r/ffpl^ti/itji (1030), pp 5-G 
* G/ \jiGZL, , I‘rinciple$ f/ llr Theorj f/ Probabihly, p 10 



tliOHAUlLlTY 


2-i3 

This master synthesis contains all the essentials of the, classical theory of 
probability and most of the important deductions from it. From the time 
of Laplace, developments of probability theory in the fields of philosophy; 
logic; mathematics; physical, chemical, and biological research; and the 
social and industrial arts and sciences were all bound to react on each other 
and to build on the same broad foundation. ‘ In the sense that he thus 
fused together the various lines of development, Laplace may be looked upon 
as the fonnulator of the classical theory of probability. 

The Classical Comept. The definition of probability given by Laplace 
and generally adopted by disciples of the classical school runs as follows: 
Probabilitj’-, it is said, is the ratio of the number of “favorable” cases to the 
total number of equally likely cases. For e.xample, if a coin is tossed, there 
are two equally likely results, a head or a tail; hence the probability of a 
head is J. If a die is thrown, there are six equally likely results, and the 
probability of any particular one of these results, say a five, is therefore i. 
Or again, if a die is thrown, the probability of obtaining an even number is 
J = i, for three of the six equally pos.sible results are even numbers. This 
last example illustrates how the classical theory derived the addition 
theorem. For it will be noted that the probability of getting a particular 
one of the even numbers is in each.case But it has just been shown that 
the probability of any even number is 4 = J + s' + 0 ! hence, the theorem 
follows that the probability of any one of a qumber of mutually exclusive 
events is the sum of their individual probabilities. 

Still another example will illustrate how the cl.ossical concept led to the 
multiplication theorem. Suppose three coins are tossed. Since either 
one of two results on the first coin can be combined with either one of two 
results on the second coin and any one of these combinations can be com- 
bined with either one of two results on the third coin, there are altogether 
2 X 2 X 2 = 8 equally possible results. The number of these eight possible 
combinations that have all three heads is 1. Hence, the probability of all 
three heads on the tossing of three coins is But this is the same as 
the product of the individual probabilities of a head on each coin, i.e., 
(j)(¥)(j) = i- In general, the probability of the joint occurrence of inde- 
pendent events- is the product of their individual probabilities. These are 
all illustrations of how the classical theory of probability, in line with its 
definition of the term, sought in every case to resolve a problem into a set 
of equally likely cases and then by the application of combination fornudas 
to determine the number of “favorable” cases. 

Criticism of the Classical Concept. There is little criticism of the theo- 
rems built up by the calculus of probabilities on the basis of the classical 
definition insofar as they represent merely a set of logical relationships. 
Generally, the same set of relationships can be demonstrated on the basis 
of other definitions of probability. Criticism of the classical concept centers 

^ Cf. Levy and Roth, op. cil., p. S. 

- The independence of the individual events is necessary for this theorem 
to hold true. In the given example, the probability of getting a head on 
any one coin is independent of the results obtained on the other tno. 



214 Tiir \oinf II tuQUh.\ci cvm t' 

rather tu tht meaning of the results obtauird ami the udcquaci of the thiwry 
fur handling problems outMdt the field of gambling ganie-*, as in the 
tical anah^is of pliN'sical, biological and economic data 

Meaning of ' LquaUj Lilrlj” Tlie principal hiio of attack on the 
classic il concept la directed against the terms “cqualij liLelv eases." 
Wliat does equally likelj mean it is asked la not “equally liVth ' 
niertJj another way of ijaamg cqiuflv probable ” and in that case is not 
the classical dcfuiition of probabdity a circular one, since it defuiea prob ibil 
iti jj] terms of itsclfT To a\ert cnlirisiii, some nilc must be laid do»ii for 
the determination of ‘ equally likely ’ or "equally probabto" that is inde- 
pendent of ' probable ’ ‘ Wliat then Mere Iho rules of the clast.iCLsta for 
determining equal likelihood? 

In the dc\cl<ipnieDt of the classical theory, two procedures wero offered 
for the dctcnmnation of 'equally likely" coses One was the prineipfa 
of tujjicieni reason and the other tlic principle of imh^crcnet, or the principle 
if Vie equal ilisJeibulion of ignorance as it was sometimes called nie first 
procedure was followed when a person examined uU asailiblc csulenic 
rGle\ant to the e\ ent in. question and noted that this c% idcnco wassymmetn 
eal with rckrcnce to tho %anous possible results lor example, utter a 
thorough examination of i die, including a mcc determination of lU center 
of graMtt and tho moments of inertia about \arious sides, it might bo con* 
chide 1 that the ln^cstlgata^ had sufficiont reason to consider tho dio perfectly 
sjmmotneal and honco tho six possrblo results equally bkely tccordiDg 
to the principle of ludilTircme uii the other hand, if tho inicstigator knew 
nothuig about the die iii qucstioQ, he had no basis for deeming one side of 
the die to be dilTettat from any other and could thertforo assume them all 
to be equally likely This second procedure is subject to particularly 
s< \ «.ro criticism aud w ill be diseu(>scd first 

i’nncipU of Indifftrtucc Total ignorance about a thing, it may nason- 
ahh ho argued can scarcely ho a sourco of any knowledge conctrnuig it, 
e\cn of that uncertain kind afforded by a probability statement In other 
words bow can something bo got out vf iiothwg? might be expected, 
the use of the principle of indifferciicr, has frequently led to paradoxical 
results This lias bi'cii generally tnic whenecer tho set of “c({ually likely ' 
rases was not discrete but was represented by a continuous lanablc Sup 
pose for example, it is known that the weight of a certain man is at least 
equal to that of Ins wife but is not more than double her weight If ignor- 
ance os to the exact ritio of weights w ‘evenly dislnbulcil,” it may lie 
roncliided that iiiv ritio of (he mill's to the woman’s weight lynng between 
1 and 2 is ns hkcly us any other within (hat interval hroni this it follows 
that the probability of its lying littwcin 1 and I 5 is 60 ]Kr cent and tlmt the 
probabilitv of its Ivmg between 1 5 and 2 is al<o uO per cenU SupjMXve, 
however the ratio of the woman’s to the mina weight had bean takin us 
the variable Tho limits would (ben be Oo and I, and according (o the 
jirinciplcof indifference all jiossibtc values of the ratio lying wnthm this range 
might Iw decincvl equally hkely Then, howcier, it may be concluded tlmt 


Cf Nauel, op nt, p -16 



FltOUABlLITY 


245 


the probability of the ratio of the womaii’si weight to tlie man’s Aveight lying 
between 0.5 and 0.75 w jn.st 50 per cent anti tliat the probability of its lying 
between 0.75 and 0.1 is also 50 per cent. But thi.s second result is in dis- 
agreement with the first, for a ratio of the woman’s to the man’s weight 
equal to 0.75 is the equivalent of a ratio of the man’.s weight to tlie woman’s 
Aveight equal to 1.33. Thus, according to the second method of distributing 
ignorance, the probability is 50 per cent that the man is 1 to 1 1 times a.s 
heaA'y as the Avoman; and, according to the first method, the probability 
is 50 per cent that he is 1 to 11 times as heaA’^y. In general, Avhen the 
prmciple of indifference is employed to determine AA-hat is equally likely, 
a change in the coordinates used to describe a continuous variation in 
po.ssible results frequently affects the A'alucs of the computed probabilities. i 

Principle of Sufficient Reason. The first method of procedure, viz., that 
of sufficient reason, puts the theory on a more solid ba.sLs. It ha.s been 
criticized, lioAvever, in respect to its practical applicability. Even after 
the symmetry of a die has been carefully detennined, it is still necessarj- to 
note any lack of bia.s in the method of throAving it or again in the surface 
ou AA'hicb it rolls. To determine these a priori are matters more difficult 
than the symmetry of the die itself. Much greater, hoAvever, is the difficulty 
of determining equal likelihood AA'hen attention Is turned from dice, coins, 
and cards to the phenomena of the scientific laboratories and of everyday 
life. An insurance company insures the Ih'Ca of a thousand men; hoAV can 
it determine a priori Avhether thej’ arc all equally likely to die durmg the 
year? Some men are tall, others .short; some are fat, some thin; .some Avork 
outdoors, others indoors. Is there any po.ssibility of the insurance com- 
pany telling, other than from its actual experience Avith men of various 
clas-ses, i.e., a posteriori, Avhother these individual differences destroy the 
equal likelihood of death? Critics of tlie cla.^sical theory ansAver Avith 
an emphatic “no.” It is easily seen, they say, Avhy the cla.=aical theory 
developed out of a study of gani&s of chance, for it is in tliat field alone that 
there is any reasonable possiliility of determining a priori AA'hether a set of 
po.-3siblc results are “equally likely.” In most other fields it is imposoible.’ 

But AA'hy insist on a rational a priori determination of equal likelihood, 
the reader may ask. Why not determine it a posteriori? For example, to 
determine Avhether a ht'ad or a tail is equally likely for a given coin, Avhy 
not toss the coin .a large number of times and note Avhether the number of 
heads is approximately equal to the number of tails? This sounds simple 
enough until it is studied more closely. Then the question immediately 
arises: IIoav good an approximation is necessary — hoAV close to 0.5 must the 
ratio of heads to tails be — before it can be concluded that a head and a tail 
are equally likely? On the a.ssumption of equal likelihood, the classical 
theorj’’ itself e.xplains that in a finite number of tos.'-es any given result is 
possible, although of course all results are not equally probable. If A 
tosses are made, for example, there are 2*' equally possible results,’ and of 

’ Cf. vox Mises, HiCHAnn, Probabilily, Statistics, and Truth (1939), pp. 
llT-115. 

= Cf. vox Mise.s, op. cit., pp. 98-110. 

’ Cf p. 274. 



24G THE \OHM IL rHEQbl- \CY Cllt\ E 

tbcs; the number tliat ould ha\ e r hnuU would l>o the number of comhiiM 
(]on3 of *V tfimgs taken r at ■» tiincv' or •• Ifenee, mv 

%aluc of r/N nould be poisMblo w»th nprobal ildv of 
S IS LarRO the probabilUj that r/\ should de\jato considcrabh from 0 5 
IS acr> small and the Rcncra) reasonableness of the hapothcsis of eijiMl 
likelihood eoull be tested b\ detcrtnuimg the probabiht> of as large a 
dc\iation from 0 5 v> is actualh found * 

The empirical rtsults therefore do not proNidc a ecrtaint} that a hcaii 
or a tad is ctjualb likcl> There u no wav of telUng for sure whether the 
deviation from an exact i aluc of 0 5 is due merel> to chance or n hetber the 
com IS actuallv biased k similar result might have been produced b> a 
biased eoin In fact, the latter might on uceasjon pitxlucc an exaetls equal 
number of heads and tails, so that, even if the ratio of r to A is esaetl> 0 S, 
It IS not certain that the com la perfeeth unbiased If, as suggestM above, 
the hj pothcais of equal hkekhood is accepted or re/ccted on the basis of the 
probabilitj of getting the given deviation from 0 5, then equal probability 
is being detennmed in i n a> that is dependent on the concept of probability 
and the criticism of circuhntv m thedehuition becomes immediately valid ' 
A btiU further criticism is this Suppose that ofter a careful examination 
of a die it IS found that the die is not «>minctnril what tbcnT If a die u 
biased how can the profdem be resolved into a set of equally likely eases? 
True if It can be detenuiued that, through careful weighing, balonnug, 
rotating etc , the occurrence of an even number is twice as likely as the 
occurrence of an odd number it might be argiicO that there arc nine cciually 
likely results, one of w hich is a one, two of wluch are twos, one of which is a 
three, two of which are fours, oac of vsbich is a hve and two of which are 
sixes and the v anous combinalonal formulas might lie l>a«ed on this assump- 
tion Ev en if the possibility of making such an a pnon determination is 
not questioned there still remains the problem of bow to treat thecasenhere 
the bias is sucli that a given result or results is 1 5 or 3 67 or r times as 
likely as some other result Laplore hunscU attempted a solution of this 
problem but failed to obtain a correct answer' It would seem that the 
problem is insoluble on the basis of the classic il concept * 

Suhjcrtire CharacUr of Clauicol Concept Since the foregoing criticisms 
hive m a way implied that "probability" was more or less objective in 
character, in all fairncas to the classical theory it should be pointctl out 
that Laplace and most of his follovrcrs took a subjcctiv o view of the concept 

* C/ p 234 

’ C/ pp 24S-249 for a further di-tcussion of this. 

*T)ie “frequency'' concept expounded below docs not suffer from tins 
cnticu«m because according to this concept, probibibty is defined as the 
limit that the ratio of heads to total results approaches as the number of 
tosscs a indcfimlcly increased cuculanty arises from the need of 
deteniiiQing equal prnbabihtr Scepp 2oO-23l. 

*vo\ Mises op n< , p 102. 

* O’ N vGLt., op ctL, p 45 



PROBABILITY 


247 


Probabilitj' to them was a “rational degree of belief.” They con-sidered 
the word “probability” as meaning a state of mind regarding a given 
statement, a future event, or any other thing about which absolute knowl- 
edge was not to be had.' It was not made clear, by the classicists, how- 
ever, just how subjective their concept of probability was. If it were a 
mere measure of degree of (psychologieid) belief, the theory of proba- 
bility became a part of the science of psychology and immediately the 
question arose as to how degrees of beUef could be added and multiplied, 
as called for by the probability calculus. On the other hand, if “rational 
degree of belief” wa^ to be interpreted as wliat everj' intelligent person 
ought to believe under the given circumstances, then the theory assumed 
a certain degree of objectivity and all the foregoing criticisms became 
applicable with respect to the e.xact content of this .standard of “oughtnc.ss."’ 
It is the contention of the critic.s of the cla-.sical concept that, if probability 
theory is to be of practical use in .statistical .science, .some objective definition 
of probability should be adopted. One writer points out tliat physical 
thermod\.-n.amics had its .starling jmint in the subjective imprcs.sions of 
hot and cold but that its tlevelopinent began when an objective method was 
used to compare temperature.s by mean.s of a column of mercury.' In the 
same way, he concludes, probability should be put upon a physical, 
objective basis. 

Frequency Concept of Probability. If the reader will toss an ordinary 
coin a large number of times, he will see that the ratio of the number of 
heads to the total number of to.sses will be clo.se to 0.5000, and approximately 
the same result will be obtained each time the experiment Ls repeated. This 
empirical fact, that in ma.s.s phenomena the relative freciuency of a given 
attribute often appears to approximate a definite con.stant, is the corner- 
stone of the frequency concept of probability. The constant value that 
the relative frequency tends to appro.ximate is identified with the “proba- 
bility” of the given attribute. 

This frequency approach to the theory of probability goes back at least 
to the work of J. Venn on the Logic of Chance published in London in ISSfi. 
In the pre.'ent day, a leading exponent of this view is Richard von Mhe.-, 
whose writings' constitute one of the most important formulations of the 
frequency theory of probability. The next section will be devoted to an 
exposition of liLs ideas. A subsequent section will dheu-ss the “intuitive- 
axiomatic” approach, which is similar to von MLses’ theorj' but differs 
somewhat in its logical basis. 

Concept of von Mises. In von MLses’ theory, probability is defined oidy 
with reference to what he calls a “collective.” This is an infinitely large 
set of “random” elements that pos.ses3 certain specified characteristics. 

' Cf. Xagel, op. cil., p. 44. 

- Cf. N.^-OEL, op. cil., p. 46. 

'vox Mi.ses, op. cit., p. 112. 

' The most important of those arc Wahrscheinlichkeilsrechnung (Leipzig, 
1931) and Prohability, Slatielics, and Truth (1939), the latter being a trans- 
lation of an earlier (1928) German edition. 



248 THh \OHMALl•JtJiQU^^CY CUHlh 

T!ie sequence of results obtained from an indefinite tosauig of a com or 
throinng of n die, the set of human births runnmg back into the indehnito 
past and projected into the endless future, the sequence of parts turned 
out bj the continuous operation of a gi^en manufactunng process' — all 
these arc examples of collectixes If the etenicnts of such sequences take 
on %amng attributes such as heads or tails, male babies and female babies, 
acceptable and unacceptable parts the first ej^ntial thanctcnstic of a true 
collective is that the relative frequenej viith which a particuhr attribute 
occurs shall approach a hxed limit as the number of elements in the collcctiv c 
IS indefinitclj mcrcascd Mathematienll) tlus means the following If r/V 
IS the relative frequencj of a given attribute among N elements, eg, the 
number of heads among N tossings of a coin and if p is the limit that 
r/A approaches as A^ is increased indcfimtcl> then after some pomt, say 
N ^ I 000 000 the difference between r/AT and p becomes, and thereafter 
remains less than an arbitrarily chosen positive quantity c say OOOo 
Aumcncally it means that if r/N is ealculated to a giv cn number of decimal 
places as N i3 increased t point la finally reached after wluch further 
mercasca bring no change in the calculated figures For example, if the 
ratio of heads to total tossings is calculated to three decimals, then, After 
some number of tossings, this ratio will always give r/AT - 0 500 The 
limit that the relative frequency of a given attnbuto approaches as the 
number of elements of a collective is indefinitely increased is defined os the 
probability of that attribute m the given collective 
The second characteristic of a true collective is its ‘ randomness ' Thus 
the sequence of elements constituting a collective must be free from any 
regularity they must be in complete disorder It is to bo noted that the 
relativ e frequency of an attribute may approach a limit in a giv cn scqueiico 
Without that sectuencG being a random one If for example, some special 
apparatus were construct e<i so that every fifth tovsing of a com resulted in a 
head and every other tossing in a tail the sequence of results would look 
as follow 8 

TTTTIlTTlTHTTTTHm IIITITTUTI ITHTTl IHTTTTH 

The limit of relative frequenev of heads m such a sequence viould be l, but 
the sequence i** obv lously not a random one It is consequently not a tnic 
collective and it cannot be said that the probability of n head under tlic 
given conditions is i Actuilly the probability of v head on every fifth 
tossmg IS 1 and on cv erv other tossing it is 0 

WTiat precisely constitutes rmdonmess’ ? von Mises’ answer to this 
fundamental question is as follows If subsequences of elements are picked 
from the original «equeuce m such a w ly that the selection of a particular 
‘ In reolilv inanufactiinnt prou'wcs cliange materially from tune to time 
AlTidt IS envisjlstd licrc is one tint remains exactly the saint indefinitely 
For 1 discussion of the st itistical aspicts of nianufjctiiring proi esses, sci 
M V Sliewhirt Slaliatual Method fiom He ViewpoiiU of Qualilj Conhol 
(Graduate School of the Lnited Statia Department of Ignmltiire 
ashington lf*39) 



PROBABILITY 


249 


element is independent of the attribute assumed by that element and if in 
all possible subsequences of this kind the limit of relative frequency of a 
given attribute is the same as in the original sequence, the latter may bo 
said to be random. In the truly random tossing of a coin, for example, the 
selection of every fifth tossing would yield a subsequence of tossings in 
which the relative frequency of heads woidd approach the same limit (.J 
if the coin is unbiased) as in the complete sequence. If such a method of 
selection were applied to the particular sequence of heads and tails given 
above, the result would obviously be a subsequence consisting of all heads, 
for which the limit of relative frequency would be 1 and not t as in the 
original sequence. This sequence clearly fails to meet the test of random- 
ness. Another example of randomness is provided by the game of roulette. 
If the results of the game are influenced solelj' by chance forces, i.e., if they 
constitute a truly random sequence, there is no way of placing bets so as to 
secure better than average residts; no formula can be devised to “beat the 
house.” As von Mises puts it, the existence of randomness means the 
impossibility of devising a gambling “.system.” 

In Summary then, a true collective is a mass phenomenon or an endless 
sequence of observations for which (1) the relative frequencies of the par- 
ticular attributes of the elements of the collective tend to fixed limits and 
(2) those fixed limits are the same for any place selection of a subsequence, 
i.e., a selection that depends only on the location of an element in the col- 
lective and not on the attribute it a.ssumes. The existence of such a collec- 
tive is the fundamental postulate of von Mises’ theory of probability. 

Criticism of von Mises' Concept. Since the initial formulation of his 
theorj' in 1919,'- von Mises’ concept of probability has been the subject of 
considerable discussion. Some have sought to refine and elaborate von 
Mises’ views; others have contended that they contain serious logical 
inconsistencies,^ Here only a brief mention will be made of these criticisms. 

Since in real life only finite scries can be observed, there has been some 
objection to a concept of probability based upon the notion of infinite 
sequences. One writer’ has attempted to work out von Mises’ ideas, using 
finite series, but the comjjlications are great and the results are not so 
comprehensive. After all, the concept of an infinite series only aims to 
give approximate results. It has been a useful mathematical tool in many 
other sciences; so why not in probability ?•> 

■ Some writers have attacked the existence of limiting valnes.’ For 
example, in accordance with the classical theory of probability, if an unbiased 
coin is tossed N times, there is always a probability, however small, that the 

' See Matheniatische Zeitschrift, Vol. 5. 

’ See list of references in von Mises’ Prohahilily, Statistics, and Truth, pp. 
316-318, references 35-51. 

’Blume, Johannes, Zur axiomatischen Grundlagung der W ahrscheirdich- 
keitsrechnung, 1934 (Dissertation Munster); Zeitschrift fur Physik, Vol. 92 
(1934), pp. 232-252; Vol. 94 (1935), pp. 192-203. 

^ Cf. VON Mises, Probability, Statistics, and Truth, pp. 121-122. 

Cf. Fry, T. C., Probability and Its 'Engineering Uses, pp. 88-91. 



250 HIE NORM^IL fREQULSCY CUlt\E 

com wiU turn iip heads in all, or in a large proportion of, the N tossings 
It IS consequcntlj argued that, vhatc^cr point is selected m the infinite 
sequence of tossings, there is always the possibility that the next N tossing"! 
wiU turn up sucli a largo proportion of h£a^ that the ratio of heads to total 
tossings mil differ from the supposed limit p by moro than the arbitrarily 
selected quantity < and hence contradict the mathematical entenon for the 
existence of a limit * The answer to this cnticisra is that it is based upon 
another (the classical) concept of probability and is a proposition concerning 
the possible results obtained from a finite number of tossings It is not 
in contradiction to another proposition that begins with a different view of 
probability and postulates the existence of a limit m an infinite sequence of 
tossmga ’ 

More serious criticism of \on Misoa’ theory has been directed agamst the 
condition of randomness There is the question, lor example, whether a 
senes that is m complete disorder and cannot accordingly be desenbed by a 
mathematical formula can logically be concened to exist Still further, 
there is the question whether hmits to relative frequencies m an infinite 
sequence can coexist with von Mises’ definition of randomness Recent 
mathematical mvestigations however appear to have resolved this dififi* 
culty It IS claimed that with a more carefully draim and slightly less 
comprehensive definition of randomness the type of collective described by 
V on Misea can for all practical purposes be concen cd to exist ’ 

THE INTUITIVE-AXIOMATIC APPROACH TO PROBABILITY 

The theory of probability presented in the mam body of this text is based 
upon the intuitive notion that theorems derived from axioms relating to 
relative frequencies approach satisfactorily the occurrence of events in 
real life It may therefore be called the “mtiutive-axiomatio" approach 
to probability It is the concept of probability accepted by such men as 
Neyman, Prichet, and Ivolmc^orolT ‘ Lving between the classical concept 
of Laplace and the pure frequenev thcon of von Misoa, it may bo calted 
a ‘compromise ’ concept 

1 See p 248 

* Cf VON Mises, FrobabilUj/, SlaitUics, and Truth, pp 12&-128, and the 
whole of the fourth lecture Also, see Naoel op^ ett , pp 37 

* The fundamental papers supporting this claim are those of A H Cope- 
land, American Journal of Jfaihemaites, Vbl 50, pp 535-552, \ol 51, pp 
612-618 Vol 53, pp 153-162, and Vol 58, pp 181-192, and a paper by 
A Wald, Ergebnwte cmes malhematistAen KoUogutums, Wien, No 8, pp 
38-72 See, however, the recent cnticism of Maunoe Frdchet, Journal of 
Unified Science, Vol 8, pp 1 22 He refers to an example by Ville in 
w hich a given set is so defined that it meets all the conditions laid down by 
Wald and y et contains the regularity that the relative frequency always 
converges to its limit by values that are greater than p He admits, how- 
ever, that von Mises docs not feel that this creates any difficulty in his 
theory 

‘See footnote (2), p 236 



FROBABILITY 


251 


The intuitive-axiomatic approach to probability differs from the classical 
theory in that it avoids the cifculariti' of “equal likelihood.” It merely 
definies probability as a relative frequency of a certain attribute in a given 
set of objects without any statement a.s to whether these are “equally 
hkelj^” It consequently avoids introducing any subjective elements into 
the definition of probability.* 

On the other hand, the intuitive-axiomatic approach differs from von 
Mises’ approach in that it does not identify probability with the mathe- 
matical limit approached by the relative frequency of a given attribute in 
an hifinite random sequence of actual events. It may indeed take the 
relative frequency of a certain attribute in a hj’pothetical infinite set as the 
probability of that attribute, but this need not be the mathematical limit 
approached by the relative frequency of any actual type of event. It 
merelj’’ saj's that, if relative frequencies of random mass data are treated a.s 
if they were mathematical probabilities of some hypothetical infinite set, 
then the calculations derived on this assumption will be satisfactorily close 
to the relative frequencies of various combinations of these data. “Kan- 
domness” is left as an undefined, intuitive concept. 

* The subjective elements and even the concept of equal likelihood enter 
to some extent, however, in the determination of “randomness” (.see pp. 241- 
242). 



CILU>1ER IX 

PROBABILITY DISTRIBUTIONS 


A dcj'Ciiption of a fimdaincntal probabilitj set gi\ing the 
\anoiJs categories, or classes, into iihich the members of the set 
are grouped, togethei with the probabilities of each group, is 
called a probability dt&tnbidion The propeitj of a member of a 
gi\en group ma^ be 'ijiokcii of as au “attnbute," and a prob- 
abiht} distiibutionmaj be said to show how the total probability 
is distnbutcd among the \anou3 attributes Since the attributes 
of a fundamental pixjbabibt> set aie neces-arilj mutuallj cxclu- 
<ii\ c and cince a member of a set must possess somo one of the 
guen attributes, it follows that the total piobabihtj, te, the 
sum of the probabilities of the lanous attnbutc-«, must cqUiil 1 
In other woixls, the percentages (piobabihtics) of the cases falling 
m the aanous gioups rau^^t add up to 100 per cent 
For csample, the di«tnbution of probabihtj among the four 
suits of an ordmar} pla\ ing card deck is 


Spades i or 

Hearts i or 

Diamonds or 
Clubs i or 


25 per cent 
25 per cent 
25 per cent 
25 per cent 


Ihc quaht} of being a «pade, heart, diamond, or club is the 
attribute of a card Since a card cannot be both a spade and a 
heart, My, thc-e attnbutes are mutuallj c\clusi%e, and since v 
card must belong to one of the four suit'', the total probabilitj is 1 
Similarl>, the distribution of probability among the mx facts 
of an onhnan 



PROBABILITY DISTRIBUTIONS 


253 


■ The quality of being a 1, 2, 3, 4, 5, or 6 is the attribute of a face. 
Since a face of a die cannot be both a 2 and a 6, say, these attri- 
butes are mutually exclusive, and since a face must have one 
of the markings listed above, the total probabilitj^ is again 1. 

Discrete Probability Distributions, When the attributes of a 
set are qualitative in character, such as spades or hearts in the 
case .of a deck of cards or heads or tails in the case of coins, or 
when they are represented b}'’ a set of numerical A'alues that do 
not vary continuously, such as the number of spots on the face 
of a die, the distribution of probability is said to be “discrete.” 
If the attributes are represented by points on a horizontal axis 
and their probabilities measured along a vertical axis, a discrete 


0.5' 


Heads Tails 

Fro. 77. — Distribution of 
probability of beads and tails on 
a coin. 



Spots on face of die 


Fro. 78. — Distribution of 
probability of the spots on the 
faces of a die. 


proliability distribution may be pictured by a series of lines or 
bars as in Figs. 77 and 78. It will be noted that it is the-height 
of the bar in each case that measures the probability of the attri- 
bute at which it is erected. 

Continuous Probability Distributions. If the members of a 
set consist of the numerical figures obtained by the repeated 
measurement of the length of a given table or the continued 
measurement of the heights of adult white males, living now and 
in the future, the attributes assumed by the members may form a 
continuous variable. In such a case, the total probability of 
1 can be considered to be distributed over the whole range 
of variation; it will thus form a “continuous” distribution of 
probability. INIore exactly, the range may be divided into small 
class intervals, and location within one of these intervals may be 
taken as the attribute of a member of the set. In this instance, 
the probability of a member belonging to a given interval may be 
represented by the area of a rectangle erected over that interval, 



254 


TUB NORMiL FRBQUESCY CURVE 


and the total distribution may be pictured as a set of suck rec- 
tangles m the manner shoivn )n Fig 79 If, now, the class inter 
vals are made smaller and smaller, the tops of these rectangles 
mil tend to sketch out a smooth curve (c/ Fig 80) A. prob- 
ability curve of this sort can be looked upon as the limit 
approached as the class inter- 
vals into which the range is 
divided are made infinitesimally 
small 

Frequency Distributions as 
Probability Distributions 
1 to 79 — A continuous distribution ol From the definition of proba- 
probabjity bility given above, it follows 

that anj frequency distribution m which the frequencies aic 
expressed as a percentage of the total number of cases is a 
distribution of probability of the given set of cases It likewise 
follows that a frequency curve that represents the distnbution 
of relative frequency of an mfimte population of cases* is also a 
probability curve 



Since a distribution of relative frequency and a distribution of 
probability aie thus one and the same thing, all the measures of 
the vanous characteristics of frequency distnbutions automati- 
cally apply to probability distnbutions Thus a piobability 
distnbution has a mean, a standard deviation, a coefficient of 
skeivness, and a coeffiaent of kurtosis, hke any frequency 
distnbution 



Inches 


1 See p 238. 



PROBABILITY DISTRIBUTIONS 


255 


ALGEBRAIC AND GRAPHIC REPRESENTATION OF THE NORMAL 
' FREQUENCY CURVE 

Functional Relationships. Before entering upon a mathe- 
matical and graphic representation of frequency and probability 
curves, it may be well to review briefly the algebraic and geo- 
metric description of simple functional relationships. This is 
the purpose of the present section. 

If one quantity varies Avhen a second varies, the first is said to 
be a “function” of the second. The pressure of air in an auto- 
mobile tire, for e.xample, varies with the temperature; pressure is 
thus a function of temperature. Again, the quantity of butter 
bought varies mth its price; hence, the purchase of butter is a 
function of its price. ' 

Functional relationships of this kind are described symbolically 
by such expressions as y = f(x), y = Fix), y = G{x), y = <pix), 
and y = \j/ix ) — all of which are to be read “y is a function of x” 
01’, more specifically, “y is the / function of x," “y is the F func- 
tion of X,” etc. The expressions y — fix) and y = Fix) are the 
most common; the others are often used, however, when a 
problem involves more than one functional relationship. For 
example, if y and z are both functions of x, this may be expressed 
by 2/ = fix) and 2 = gix). 

Frequently' a quantity varies, not merely with one, but with a 
number of other quantities. The former may then be said to be 
a “joint function” of the latter. Thus the volume of gas in a 
tube is a function of the pressure and the temperature (Boyle’s 
law) ; the quantity of butter bought is a function of the price of 
butter and the income of its purchasers. Joint functional rela- 
tionships of this kind are expressed by y = fix,z), y = Fix,z), 
y = <pix,F), etc., — all to be read “y is a function of x and 2.” 

Explicit and Implicit Functions. The functional relationships 
so far considered are “explicit” functions. In explicit functions 
one variable is selected as the dependent vai-iable, and the other 
or others as the independent Amriables; this is indicated by writ- 
ing the dependent variable to the left of the equal sign. Often, 
however’, it is convenient to talk of two variables as being func- 
tionally related \vithout indicating which is to be taken as 
dependent and which as independent variable. Such a func- 



Jj() IIIH \Olt\l\l tUtQLLSti LOItVH 

tional relationship li indicated b> /(x,y) = 0, F{x y) =» 0, 
ifi(x,y) =* 0, etc, or if there arc more than two sanable*, bv 
f(xys) =0 = 0, = 0, etc Thc'-c all mean 

tluitxandi/or j y aiidzart “functionillj related” Function-* 
of this kind are called implicit ’ functions An explicit function 
can often (although not ihiajs) be denied from an implicit 
function b\ mciel\ soh mg the latter for the lanable -'clctltd is 
dependent 

The ‘•iinplest Upc of functional relationship is cxpiLv-cd h\ i 
I>obnomia} V poJjnomul m x means such c\prcs.-ioas as x 
(stneth speaking a monomial), a + x, o + h-c, and a + hx + at 
llic ‘'degree' of the pol>nuniial is the highest power of x that 
occurs in the expression Thus o + hx is a polynomial of the 
first degree a + bx + ex* ind o + or* are polynomials of the 
■*econd degree, and a + hx + ex* + dx*, a + hx + dx*, 
o 4- at* + dx*, and a + dx* are polynomials of the third degree 
Polynoinuh m two lanables, say x and z, arc illustrated by 
a + bx + gz a + bx + ex* + gz + hz", a + bx + gz + mxz, and 
o + lix + Aa* the first being of the first degree, the second and 
tlurd of the HConU degree and (he lost of the third degree 

If y \ mc*> In t constant abMilute amount every time x \ vnes 
bv a fixed given amount, the function that expres cs this rela- 
tionship IS a first-degree polynomial m x, such as y « a + hx 
For every time i increases (or decreases) by one unit, y increase-* 
(or decrea-“ts) by b units Thus if y =* 10 + 2x, y increases (or 
decreases) bv two units every time x mcrca«es (or decreases) by 
one unit If b is negative then the xaiiation m y is opi>ositc in 
direction to that of x For cxunple, if y — 10 — 2x, then y 
decreases (or increaH-s) by two units every time x increases (or 
deereases) bv one unit fhe quantity a is the value of y when 
X equals 0, if y =* 0 when x = 0, then a must be zero 

^\ hen a functional relationship can be represented m the* w ay 
b\ a first-degree polvnuimal, such aa y = a + hx, then y i* said 
to be u “linear” function of x Phis continues to hold true 
when there is more than one independent vanablc Thu* 
y ~ a bx gz is said to express y as a linear function of 
X and z If the eh mge in y that iccoinpame* a gi\ cn eh ingc in x 
vanes with the value of x, then the functional lelationship can 
no longer be cxpressc-d by a first-degree polynomial in x Fhe 
fiinetion is m this ca^c “nonlinear,” and a higher degree iioly- 



PROBABIU T Y DIBTRIB UTIOXB 


257 


iiomial or some more complex expression must be employed to 
indicate the relationship. A nonlinear relationsliip between y 
and two or more variables, such as x and z, must also be expressed 
bj”^ some function other than a first-degree pobmomial in these 
two variables. 

Graphs of Simple Functions. A pair of values, x and y, maj' 
bo represented by a point in a plane. The point P in Fig. 81, 
for e.xample, represents the value x — xo and y = ya; i.e., the 
coordinates of P are (xo,;/o). 

First-degree Polynomials. The graph of a first-degree poly- 
nomial is a straight line (hence the name “linear” relation.ship), 



l''iG. SI. — Plotting of u ijoint. 



and conversely c\'ery straight line maj' be represented algebrai- 
cally by a polynomial of the first degree. The simplest way to 
comprehend this relationship between the algebraic and the 
geometric presentation is to think of a straight line in reference 
to (1) the angle it makes with the ar-axis and (2) the intercept 
it cuts on the y-axis. Thus, in Fig. 82, let 6 represent the angle 
CAB that the line makes with the x-axis at A. The tangent of 
this angle is the slope of the line AB. Let b represent this .slope; 
then b = tan d. It is evident that a straight line is determined 
when its slope b and its y intercept, a (or OB), is found, as follows: 
Let P (Avhose coordinates are anj" x, y) be a representative point 
of the line. Take PC, the perpendicular to Ox, from P, and 
BD, the perpendievdar to PC, from B. Then, 

DP 

tan DBP = ^ 


or 

DP = tan DBP X BD (1) 

But 

DP = CP - CD = CP - OB = y - a 



nil- SOltMAI* FttLQObNCi CUItX F 


DD — OC ~ X ind tan DBP — tan CAP 

(similar triangles) 

and 

t in CAF — tan 0 = {» 

Now Mnee DP = y — a, ind tan DliP ~ tan d = b, and BD == x, 
then upon •substituting in r<i (1), 


V straight line thus reprc{,cnts y as equal to a poljnoinial m x 
of the first degree Wlicn the line passes through the origin, 
a « 0 and the functional relationship becomes simplj y = lx 
Second-degret, Polynomials If the functional relationship 
between x and y takes tho form of a second-degree poljnomial, 
such ts y = a + lix -j- ex*, its 
/ graph will be a parabola 11> 
/ definition, a parabola is the locus 
C ~-F^—Jp(x,y') of all iioints equidistant from a 
\ , fixed i>oint called the "focus” and 
a fixed line called tho "dircctnx ” 

^ r 1 ^ ^ '5 ft>cu3 13 tho point 

^ t St ~ ^ (F) and the directrix is the line 

ho 83-Oraph^^o^ft^ para>K,u j>s (y + F = 0) Takean> point 
on the parabola, P{x\y'), and 
draw from it a line perpendicular to RS, at point M, then draw 
a line from the focus (F) to the point P 
At the point x = 0, y = 0, it i3 obvious that the parabola is 
equidistant from the directrix y -P F = 0, or y » — F, and the 
focus at X = 0, y = F 

Now, if a line were drawn from P perpendicular to the y-axis, 
at C, it is clear that FP* « (y' — F)* + (x')*, bincoFC ~ y’ — F 
and PC — z' Tins is true since tho square of the hypotenuse is 
equal to the sum of the squares of the other tw o sides of a right- 
angled triangle Turthermore, it can be seen from Fig 83 that 
MP* (y' -f because VP is y' -f F By hypothesis, 
Mpi s* Fp», and hence, by substitution, 

(y' + f )* = - F)* + (a/)* 

which IS true for any value of x* and y', that is, any x and y, and 



PROBABILITY DISTRIBUTIONS 


259 


which by transposition and simplification becomes 
= iFy or ^ 

This is of the general form of y = cx~. If the curve had not 
passed through the origin, its equation would have been of the 
form y = a + cx-\ and if its vertex had come either to the right 
or left of the y-axis, its equation would have been 

y = a + + cx- (3) 

If the value of c is negative, the curve turns down as in Fig. 84 
instead of up as in Fig. 83. These parabolic curves thus Illus- 
trate the form of a functional relationship in which y is set equal 
to a second-degi’ee pol 3 Tiomial in x. 



Fig. 84. — Graph of a parabola, 
y = —cx'^ (c > 0). 



Fig. 85. — Graph of a circle, 
+ U'- = r'-. 


■ The Circle. The implicit functional relationslup 


x^ + y^ — r- = 0 


(^) 


is of interest in that its graph is a circle. By definition, a circle 
is the locus of all points equidistant from a fixed point called the 
“center.” In Fig. 85, the center is taken at the origin. Bj' the 
property of right-angle triangles, the square of the distance of 
any point P from the center at 0 (the origin) is simply x- -f y-. 
Since by definition this must be the same for all points on the 
circle, the equation of the circle is simply x- + y- = r-, where 
r is the radius. By transposition, this becomes x- H- y- — r- = 0. 
If the center of the circle had been at the point (a, 6) instead of 
at the origin, its equation would have been 

(x - ay- +{y- by - = 0 



200 


THE \OnM it FREQUE\Cl CUfl\F 


An implicit functional ielation*»hip, tlierefore, m i\luch two 
\anables enter to the bcconil degree with identical coefficients 
but in which (hcie is no cioiss product term (buch ib xy) is simply 
the algebraic expresbion for a circle 

The hlbpsc If the function il relationslup rcjireaentcd b^ i 
circle is modified so that the coefficients of the second-degicc 
terms are no longer identical (but still retain the same signb), its 
geometric counterpart is distorted so as to become in ellipse 
Thus the graph of the implicit functional iclationbhip 


ax* + 6y* -- r* = 0 


(5) 


lb an ellipse who-se semimajor axis la r/Vu and whose scimmmoi 
IMS IS riy/li (cf I ig 86) If 6 is less than a then the ellipse 
luns the other w v\ vs m tig 87 


y 


f -ry^ 

r/yk\ X 


_y 

Ik. so — G iiipli 
+ bi/i r> - 

ot un clhp « <u’ 
0 (a < » 


y 


r 

lnA?\ 

1 r/tS 

rASj 

1 ^ 

■r/yS j 

87 — Grftph 

of bii tUipM 

+ ill’ r» 

- 0 {o > 6J 


It will be noted that if cither of the implicit lelationships 
X* + y' — r* s= 0 or ax* + by* — r* = 0 is &ol\ ed for y, the 
icoulting solution gives y, not as a single-valued function of x, but 
as a double-valued function Thus, in the ca‘-c of the circle, 
y = + -y/r — x", w hich shows that, for e vch v alue of x, there arc 

Jr* *ax* 

two valucb of y Likewise for the elbp'sc, y = i-J'jj' 


wluch again gives two values of y foi each value of x Geo 
metncallj this means that a line iicrpendicular to the x-aMs 
cuts the ciidc and ellipse at two diffeicnt points In contrast, 
the straight line and the parabola cxpiess y as a single \ alued 
function of x TJie diffuence is due to tJic fact that, in the case 
of the ciicle and ellipse it is y* and not y that is evpiessed as a 
polj nomial m x ind w hen the square root is taken tw o v allies of 
y result 



PRO BA BILITY DISTRIB UTIONS 


261 


Rising Expo7ienhal Function. A simple function that does not 
express y or ?/- as a polynomial in x is the exponential, or “com- 
pound-interest,” function. Suppose that a sum of SI is put 
out at interest at 6 per cent per j'^ear compounded for 3 years; 
then the value of this sum at the end of the 3 years Avould be 
as follows: 

Value at end of 3 years = (1 -f 0.06)(1 -f 0.06)(1 + 0.06) 

= (Id- 0.06)’ 

In general, if SI is put out at interest for x years and compounded 
at the rate of r per annum, its value at the end of the x years 
ivould be as follows: 


VaUio at end of .r years = (1 -f- r)-' 

If the interest is compounded eveiy C months instead of every 
year, then the value at the end of x years would be as follows: 

Vnlue at end of x years = f ^ ^ 



for the interest rate for 6-months is just half the rate for a year 
and the number of G-month periods is just double the number of 
years. If the interest is compounded every ciuarter, then the 

/ ,.\lx 

value of the SI at the end of x years would \ ^ 4 i > 

if the interest is compounded every I/nth of a j'ear, the value at 
the end of x years would I)e j always being the rate of 

simple interest per year. This last maj’’ be written 



If, now, n is made infinitely large, in other words, if the period of 
compounding (that is, 1/ath of a year) is made infinitesimally 
small, so that the operation of compounding may be viewed as 

/ l\ " 

practically continuous, then the quantity ( ^ V equal 


approximately to e, the base of the Napierian system of loga- 



2G2 


THL NOliMAL FREQUENCY CURVE 


nthms For, by definition, e is the limit of ^1 + as m 
approaches infinity The value of $1 at the end of x years when 
interest at r per annum is compounded continuously is therefore 
e “ 

Ihe quantity e, it wiU be recalled, is a numerical constant 
equal approximately^ to 2 7183, so that the value of a sura 
compounded continuously for x jears is given by a function of 
the form y = o‘*, where a and b are merely constants The 
compound interest function is thus an “exponential” function, 



IioSS -Grftpho/arisingespooennai 
(unttion ii ■ (»• > 0) 



ha S?*— Graph of a doclii ing oa 
pooentia) /uoctioo y “ «“'* (r > 0) 


for the vanable x enters into the function as an exponent A 
graph of this function for a positive \aluc of b(»« r) is showTi in 
Tig 88 

Declining Exponential Function The “present value” of 31 
due X years hence, interest being compounded continuously at 
tbo rate of r per annum, would be equal to For since the 
sum of $1 would accumulate to c'* dollars at the end of x years, the 
sum of 1/c") dollars would accumulate to 

g-rx _ go _ ^ dollar 


' This may be easily proved bj putting incrcasmgly large values of m in 
the formula g = Thu* 

for m = 10 <? = (!+ * (1 1)” =» 2 o9J 

for 7» = 50, « = (1 + A)“ = (1 02)*» - 2 691 

for m = 1,000, e = (l + p^)‘ *** = d 00i}‘ »“« = 2 7171 

for w = 10 000, e = (1 + I/IOOOOF"*®* “ (10001)"“'“® = 27182 the 
calculations being earned out by logarithms (see Appendix, Table I) 



PROBABILITY DISTRIBUTIONS 


263 


at the end of that time, and hence the present value of SI due x 
years hence is This discount** function, as it may be 

called, is thus a negative exponential function. Its graph is 
shown in Fig. 89. 

Whereas the exponential functions are not themselves poly- 
nomials, it will be noted that the expression that constitutes the 
“exponent of e” is a first-degree polynomial in x (more accu- 
rately, the monomial x). T his 
suggests that interesting modi- 

fications of the exponential func- ' f \ , 

tion might be obtained by / \ 

making the exponent of e a / \ 

second or higher degree poly- 

nomial in x. A function of this — = x 

kind that is of very great impor- ^ , 

tance in the theory of frequency curve, i/ = > O). 

curves is y = Figure 90 

shows how well this function represents the general shape of a 
frequency distribution; as will be seen shortly,, it is the kernel of 
the formula for the normal frequency curve. 

Formula for the Normal Frequency and Probability Curves. 
Formulas for frequency and probability curves may be written 
in two ways. The first, wliich may be symbolized by y = <p{X), 
merely expresses the ordinate of the curve y as a function of the 
attribute X whose frequency is being measured. {<p is used here 
to signify “function of,” instead of the usual F or/, in order to 
avoid confusion mth the employment of the latter to indicate 
frequencies.) In this form the formula simply describes the 
locus of the points that constitute the frequency or probability 


curve. 

The second method of writing a frequency or probability 
formula is d{F/N) or dP — <p{X) dX. In this form, the prob- 
ability or relative frequency of a case lying between X and 
X -k dX is expressed as a function of the attribute X. It will 
be recalled that a probability or frequency curve is the limit 
approached by an area histogram as the class interval is made 
infinitesimally small. The expression d{F/N) or dP = tp{x) dX 
merely says that, when the class interval (of size dX)^ is made 

1 The letters dX, dP, d{F/N), etc., are to be read as a single symbol and 
not as the product d and X or d and P, etc. The symbol dX means an 



JJJi, \OIl\t IL Hl}-QUh\Ci CUHVb 


infiiutcsimall> einall, thi, area uodcr tht. cur\e for unj class 
Hittr\ \\ [that !«., ii{F/N) or dP] JS appioxim itelj equal to tho 
area of a '•mall rectangle nhobe base is dX and whose height is 
the ordinate of the cur\c <p(\) at \ (c/ Tig 91) This ^second 
method IS stnctly &ixak)»g the proper method for descnbuig a 
“probabihtj” or “frequency” 
curie, since it is this and not the 
/irea under- I ' > actually expitb&cs the 

corm -r , , ! \ probability orfrequency of agivcii 

ffisaooprGx 1 class interval as a lunctiou ot the 

mafefity | attribute \ Ihc former giies 

areaofths u y i i i “ r 

reciong/e ' T nicielj an algebraic cxpiession foi 

I the cuno ind not the area under 

j thccune* 

1 I lie Aomifll Cunc One cuno 

dx occurs \orj often m statistical 

iio 91— Orapheal «pre»e la analyst? especially m tlio theory 

t,on ol . protabJ..j tunn,»p ll„s ,5 ,1,„ 

frequency cur\c Its algebraic and graphic rtpresentatioii mai 
profitabK bo illustrated by a bncf descnption 
1 h( matheniatu al formula for the normal cur\ e is 


dx 

1 — ^rapl leak r^prese i 




where c(=« 2 7183+) is the biso of the Napienan sy^item of 
logarithms \ is the mean of the distribution and a is its stand 
ard deviation Pictuies of the curve aie giv on in Figs 92a and 
92h It will be noted that the euiae is symmetrical and gen 
orally hell shaped 

Owing to its symmetry, tlie center of the curve comes at tlie 
mein point \ = AT Here the curve also reaches its greateat 

height, ii 2 ^ — and fiom there it slopes gridualh downward 

a v2ip 

Oil each side as the factor e ^ assumas greater and jpvater 


mfinilesunally small part of the \ range the symbol dP means an infimtosi 
nal probability, and the symbol d{F/V) means nn infinitesimal relative 
frequency 

‘ The first method of desrriptio I IS hovcvtr tl c only form that is apjw *■ 
pnate for a li«cntp distnl itim 



PROBABILITY DISTRIBUTIONS 


265 


.significance. ‘ The points of inflection, i.e., the points at which 
the left and right branches change from concave to conve.x 
downward fall at X = X + tr. A bout rivg-third.g, of the_ ac.e_a 
under Jlie._cunie lies, between these two points of inflection, and 
about 95 per cent between X — 2o' and X + 2a'. The average 
deviation equals 0.7979 times the standard devdation, and the 
fourth moment equals three times the square of the second 
moment (hence /?o = 3). 



Fic.. !)2a. — Grajjlii of normal freqneiicy cuives with cliffoiont moan's hut hamo 

standard deviations. 


As will be noted from the formula, a particular normal curve 
is determined when its mean X and its standard deviation a are 
gi\'en. Different normal curves, then, will have either different 
means or different standard deviations, or both. If the means 
are different, the curves trill have different po.sitions on the X- 
axis; if the standard deviations are different, the curves will be 
of different tridths. This is illustrated in Figs. 92a and 926. 

Although normal curves may thus differ in respect to their 
means and standard deviations, they all possess the essential 
“normal” foi’m. This ma}'- be brought out by measuring the 
attribute X, not in original units, such as pounds or dollars, but 
as a deviation from the mean of X measured in standard-devia- 

-ix~xr - j 

'It will bo noted that c When X = A, this bocomob 


a-xy- 

l/go = 1 = 1. As X moves away from X in either direction, e 

' * 1 

becomes larger and Larger and hence ■ becomes smaller and smaller. 

e 

All the ordinatc.s are po.sitive since the exponent of c i.s .squared. 



206 


Til? SORMAL fRhQVBNCY CVRVh 


tioii units, te, as — - ^ or^ Whenever this is done alt 
normal curves become one and the same curve 


Suppose, for example, that X in one case represents pounds 
and in another quarts In both cases the probability or relative 
frequency is "normal)} ” distnbuted, but the mean of the first 
distnbution is 4 pounds and ita standard deviation is 1 5 pounds 
This distribution is lepresented by curve A in Fig 92o The 
second distribution has a mean of C quarts and a standard 
deviation of 0 5 quart, it is lepresented by curve D m Fig 926 
If however, the unit adopted for the measurement of the attri- 
bute of an clement is m the first case fXih 



FiQ Gntihat <[ ioiuiaI tre<|U«ncy cuit«!( kiUi &auie tueao but d/feieit 

standard daviationa 


^second case - — tte — then in terms of these ' « - 

0 5 qt if IT 

units the two distributions nilJ be identical It is thus possible 
to reduce all normal distnbutions to a standard normal form^ 


(see Fig 93) 

Consider now the relationship betiieen the mathematical 
formula for the curve and this standard normal form In the 


individual or nonstandard form the formula for the curve is 


1 (i-^» 

that given above, viz , dP ^ e dX This saj s that 

IT 

the infinitesimal portion of the total aica under the cur\e dP cut 
off by the mfinitesimai class mterval X to V + dX is approxi 
mately equal to the area of an mfinitesimai rectangle whose 


‘ It will be noted that in the standard form the lieight of the middle 
ordinate is l/\/^ = 0 3989 



PHOBA BlLl TY DISTRIBUTIONS 


207 


height is 


(.Y-.Y)s 


ff 

s exp: 
formula becomes, 


e and whose base is dX. If now the 


X — X X 

attribute is expressed in = - units, the mathematical 

O' (T 


dP 




© 


In this case, the infinitesimal portion of the total area cut off 

CC CO I X \ 

by the infinitesimal class interval - to - + d ( - 1 is expressed as 


.rKO- 


the area of a rectangle whose height is 




and whose 



Kio. 93. — Graph of a .standard normal curve. 


base is d(a;/cr). In other words, the effect of measuring the attri- 
bute in xja units instead of X units is to change the size of the 
unit class interval on which the rectangle rests but at the same 
time to change its height proportionately so its area dP remains 
the same. 

1 — 

A table giving the value of ® various values of 

x/a will be found in the Appendix, Table VII. These are the 
ordinates of the standard normal cuiwe. It was by means, of 
this table that Fig. 93 was plotted; its use in “fitting” the normal 
curve to an actual set of data will be explained in the next 
chapter.^ 


1 See pp. 277 and 29.5-.304. 



CHAPTFU X 
PROBABILITY CALCULUS 

Two Fundamental Theorems There are two fundamental 
theorems m the calculus of probabilities The&o are the addition 
theorem and the multiplication theorem There is also a special 
form of the latter that pertains only to “independent” attnbutes 

Addition Theorem The addition theoiem pertains to the 
summation of the probabilities of one and the game probabihfj 
set Suice the several attributes of a gi\ cn probability set are 
necessanlj mutuallj exclusive, it follows that the relative fre 
quency of cases having either one of two attnbutes is the sum 
of the lelative ficquencies of the separate attnbutes For 
example there ate 13 hearts in an ordinary deck of 52 playing 
cards and also 13 diamonds The relative frequency of a heart 
IS therefore if, and the relative frequency of a diamond is also H 
The lelative fiequency of a red card, ic, either a heart or a 
diamond is which eiiuals H + H Since the relative fie- 
ciuencieb of the attributes are by definition their probabilities, it 
follons that the probability of a member of a set having either 
one of several attributes is simply the sura of the individual 
probabilities The piobability of a heart or a diamond is tlius 
H ^ = j This addition theorem is valid for infinite 

piobabilitj sets as well as for finite sets 

vUgebraically the addition theorem may be expressed as 
follows If the attributes of a given set are Xi, Xi , X, 
(representing cither qualitative or quantitative characteristics) 
and their probabilities orepi,pi, , p„ then the probability of 
Xi, Xj, or Xj, say, that is, the attribute of being any one of these 
X's. IS simply pi + ps + pj If the vanation in attnbutes 
within tlio set is contmuoiis and if the distnbutjon of probability 
IS described by a formula such as df* » v(X) dX, then the piob- 
ability of an attnbute vvithm any one of a number of small 
langes dX whose sum constitutes the range Xi to Xj is given 
by SdP = Sv^CX) dX, oi, m the symbolism of the integral 



PROBABILITY CALCULUS 


269 


calculus, / c?P = / <p(X) dX. The significance of this theorem 
will become clearer when its application to particular problems is 
considered." 

The addition theorem is sometimes stated as follows: The prob- 
ability of either one of two mutually exclu.sive events (attributes) 
is the sum of their individual probabilities. This version of the 
theorem is perfectly valid if it is understood that the two attri- 
butes belong to the same set. It will be recalled that all the 
attributes of a set are mutually exclusive.* It is not true, how- 
ever, that all mutually exclusive attributes belong to the same 
set. To illustrate this point von Mises gives the following 
example:" 

Suppose that the probability of a man dying between his 
fortieth and fort 3 '--first birthdays is 0.011 and the probability of 
his marrying between his fort 5 '^-fii-st and forty-second birthdays 
is 0.009. These events are mutuaUj'- exclusive, but it cannot 
be said that the probability of a man either dying in his fortieth 
year or marrying in his forty-first year is 0.011 + 0.009 = 0.02. 
The two events do not belong to the same set, and the addition 
theorem can be validly applied only to attributes of one and the 
same set. 

Multiplication Theorem. The multiplication theorem pertains 
to the calculation of a probability of a “derived” or “second- 
order” probability set from the probabilities of two or more 
“first-order” sets. Consider, for example, an ordinary deck of 
pla 3 dng cards and a pinochle deck. Each maj'- be said to con- 
stitute a “first-order” probability set. In the first set the prob- 
ability of an ace is -^ == iV, and in the second the probability of 
an ace is Let the third set be formed from these two 

first-order sets by combining each card of one fii-st-order set with 
each card of the other first-order set. Furthermore, let the 
attribute of any pair be the vmlues of the cards making up the 
pair, such as a king and a nine, an ace and a queen, a two and a 
ten, and the like. The probability of any one attribute in this 
second-order set, say the probability of a pair consisting of two 
aces, is the relative frequency of such pairs among all possible 
pairs that might be formed. It is the purpose of the multiplica- 
tion theorem to give a general rule by which a probability of a 

‘ See p. 252. 

- Prohabilily, Statistics, aiul Truth, p. 54. 



270 Hit. SOM/l/ HtbQUhSCY CUltVL 

second-order set of this kind can be computed from the prob- 
nbditica of the first-order j>ets 

In deriving the multiplication theorem, two eases must be 
distinguished one pertaining to independent probabilities and 
the other to dependent probabilities Consider fiist the ease of 
independent piobabihties Suppose that m finding all the van 
ous pairs of cards (hat might be composed of one caixi from each 
dock theic IS no limitation on how the cards might be matched 
Then each card of the ordinary deck will have associated with it 
ev erj card of the pinochle deck Hence the set of pinochle caids 
that arc paired with the ace of siiadc^, say, from the ordmarj 
deck will be the same as the set of pinochle cards paired with the 
jack of diamonds sa\, or in fact with any other card from the 
ojdinarj deck 

Since the set of pinochle cards paired with each card of the 
ordmarj deck is thus the same as the set of pinochle cards paired 
with every other card fiom the ordmarj deck and smeo this 
common set is ideutical with the original pmoehle set, it follows 
that the probabihtj of any given pinochle card in the set paired 
with anj given caid of the oidmary deck 13 the same as the 
probabihtj of that pinochle card m the ongmal pmochlo deck 
This being the case the piobabihtj of a card fioin the pmochk 
deck IS said to be mdci>cndcnt of the attnbute of tho card fiom 
the oidinary deck 

In general, the probabilities of set II aie said to be mdependent 
of the attiibutcs of probabihtj set I if, when the members of the 
two sets arc paired together the probability of an attribute in 
the subset of members of set II paired with any gi\ cn member of 
set I is the same as the probability of that attnbute m tho 
ougm d set II Symbolically, P(B) isindependcntof difP(B/jl), 
that IS, the probabibty of B given A, equals P{B), that is, the 
piobabilitj of B 

Consider once again the gi\ cn erample Since each cai d of the 
oidmary deck is associated with each caid of the pmoclilo deck, 
the total number of different* pairs of caids that can be formed is 

* Ditltrcot 111 the sense th tt tlic cards goui^ to make up aiij p ur arc not 
precisely the saiao cards as thoac making up anj otJicr piir This means 
that the two aces of spades, the two kui(,3 of spades, tlie two jacks of dia 
iiionds etc , m the pinochle deck inu&t be considered as different cards, e\ en 
though their V slue nod suit are the same 



PROB. I BILIT Y CALCUL L 'B 


271 


52 X -18, for there iire 52 Avay.« in which llic first card can be 
picked and 48 ways in which the second card can be picked. 
Likewise, the number of comlnnatiou.s that would consist of two 
aces would be 4 X 8 = 32. I fence the probability of a pair of 
aces in the whole set of pairs of cards is 32/2,490 = Vg. But 
from the calculations this is seen to be equal to X = ?'». 
In other words, the probability of a pair of aces in the second- 
order probability set is etiual to the probability of an ace in the 
ordinary deck time.s the probability of an ace in the pinochle 
deck. 

The multiplication theorem for independent probabilities may 
thus be stated a.s follows; If is the probability of a member 
of set I having the attribute A and if qi is the probability of a 
member of set II having the attribute B, ami if qi, is independent 
of the attribute a.^sumed by a member of set I, then the proba- 
bility of a member of the derived set I-II having the attribute 
AB, that Ls, the probability of a pair AB among all possible pairs 
of two elements from each of the two sets I and II, is the product 
of the probabilities and qu. In simpler form, if P{B) is inde- 
pendent of .4, the joint probability of .4 and B is equal to the 
probability of .4 times the probability of B, that is, P{AB) 
= P{A) ■ B{B). 

Consider ne.\t the case of dependent probabilities. Suppose 
that, in picking pairs of cards from the two decks, the following 
modification is introduced. Suppo.se that every time the card 
picked from the ordinary playing card deck is an ace, a king, a 
queen, or a jack and that, before any .selection is made from the 
pinochle deck, an ace, a king, and a queen from each suit is 
discarded from the deck and is replaced by a jack, a ten, and a 
nine of each suit. The pinochle deck would then contain the 
same number of cards as before, but it would have 4 aces, kings, 
and queens instead of 8, and 12 jacks, tens, and nines instead of 8. 
After the pinochle deck has been modified in this Avay, let a card 
be selected from it and combined with the ace, king, queen, or 
jack picked from the ordinary deck. If the card picked from the 
ordinary deck is not an ace, a king, a queen, or a jack, let no 
modification be made in the pinochle deck. The effect of this 
modification in the method of forming pairs of cards is to make 
the attributes of the second set dependent on those of the first. 
For the set of cards of the pinochle deck a.ssociated with an ace. 



272 JJIJ- \OJni IL J-JlLQUBSCy CUIllJ 

a king a (lutcn, or a jack of the ordmarj deck ib diffcitnt from 
the bet of cards of the iwnochlc deck a^^iciatcrl with other card-, 
of tlie ordmarj deck The probabihtj of a gi\ cii tj pc of card 
from the pinochle i>ack now depends on the card from the 
ordmarj deck, no longer does P{B/ i) = P(B) 

I he number of different pairs of cards that can bo made m the 
way jubt described is 52 X ■tS as before, for the first card can still 
he ^elected in 52 wa>s and the second m 48 wa 5 S The number 
of pairs of cards consisting of two aces, ho\\e\er, is now 4X4 
instead of 4 X 8 as m the prcxious example Hence the proba- 
bility of a pair of aces among the whole set of pairs is 
4X4 

o2 X 48 156 

It will bo noted, howc%cr that this probability of a pair of acc> 
lb the product of the probability of an ace in the ordinary deck 
(that IS, A) times the piobability of an ace in the modified 
pinochle deck (that is, -j^) In other words, the probability of a 
pair of aces is the probability of an ace m the oidinary deck 
times the probability of an ace m the pinochle dock gi\en the 
selection of an ace from the ordinary deck 
Ihe multiplication rule for dependent attributes may thus bo 
stated as follows If members of piobabihty set II are pairwl 
with members of probability set 1 in such a way that the prob- 
ability of an attnbute in the sublet of members of set II paired 
with any gi\en member of eet I \anc3 with the attribute 
of that member of set I, then the probability of the pair AB in 
the whole set of paired values is equal to the product of the 
probability of A in set I times the probability of B m that subset 
of set II associated with the given attnbute A Symibolically , 
the multiplication rule for dependent attnbutes is 
IB) = P(d) P{B/A) 

that In, the probability of AB equals, the probability of 4 times 
the probability of B giicn 1 Since, when H(B) is independent 
of A, P(B/A) = P(B), the multiplication rule for independent 
probabihtioo is i special case of the general formula 
P(4B)==P(4) P[B/A) 

The multiplication theorem for both dependent and independent 
probabilities is \alid for infinite sets as well as finite sets 



PROBABILITY CALCULUB 


273 


TiiG signifiCcincG of iiiclopGiiclciicG tinci dcpcnclGncG iiia.y be 
illustratGcl by a ease pertaining to real life. Suppose that one 
probability set consists of American fathers of the white race 
and the other probability set consists of their eldest sons. Sup- 
pose the attributes distinguished in each set are dying from 
cancer, dying from heart disease, dying from tubererdosis, and 
djung from other causes. If, now, the probability of a son dying 
from tuberculosis, say, is greater, among those sons whose fathers 
died of tuberculosis, than among the whole group of sons, then 
the probabilit 3 ^ of a son djdng from a certain cause is not inde- 
pendent of the cause of death of his father. If, on the other 
hand, the probability of a son dying from any particular cause 
is the same for the sons whose fathers died from cancer, the 
sons whose fathers died from heart disease, the sons whose fathers 
died from tuberculosis, and the sons whose fathers died from 
other causes, i.e., if the probabilit}'’ of a son dying from anj’' 
particular cause is the same, whatever the cause of the father’s 
death, then the probability of a .son’s death is independent of the 
cause of death of his father. For example, a case of dependence 
would be the following; 


Probubilily of death of eldest son from 



Can< i*r 

Heart 

dKcabC 

Tubercu- 
• losia 

Other 

caiusCb 

Eldest sons who.se fatliers died of 
Cancer 

0.310 

0.102 

0.030 

0.558 

Heart di.sea.se 

0.218 

0.151 

mSm 

0.590 

Tuberculosis 

0.220 

0.118 



Other causes 

0.215 

0.112 

0.042 

IPW 

All eldest sons 

0.228 


0.046 



A case of independence would be that in which the figures 
in every row of every column Avere the same and these in turn 
ivere the same as the figures for “All sons.” If the probability 
of a father djdng from cancer rvas 0.228, from heart disease 
0.120, from tuberculosis 0.046, and from other causes 0.606, 
then, in the case of dependence represented by the above table, 
the probability of both a father and an eldest son dying from 
heart disea.se would be (0.120)(0.15]) = 0.018. In the case of 








274 HIE \OfiUlL n{AQUi\Cl cuin L 

independence, on the other hand the probability of l)oth i 
father and an eldest son dying of heart disease would be 

(0 120)(0 120) = 0 014 

Illustrations The following examples will help to illusti itc 
the 'use of the addition and multiplication theorems in the cal 
culation of desired probabilities Some will also serve to illub 
trate the use of the normal probabihty curve 

examples Involving Discrete DtstrihuUons Suppose that a 
gambling game consists of the random tossing of five com'i 
You agree to pay your opponent a predetermined sum of money 
whenever all five coins turn up heads, he agrees to pay you a 
predetermined sum whenever any other result occurs The ques- 
tion IS What should the odds be to mike the game a fair one? 
The ansjver is obtained as follows 
Assume that the charactci of the coins and the method of 
tossing are such as to cause each com to tend to turn up heads 
and tails in equal proportion In a large number of tossmgs 
therefore the probability of a head on each com may be taken 
as i Assume, abo, that the method of tossing is such as to 
make the tosses of each com independent of the others Then 
the probability of heads on all five corns is 

(«(i)(i)(i)a) = ay = 

Vccordmgly, the fair odds arc 31 to 1 That is, the game will 
be fair if you agree to pay >our opponent S31 every time five 
heads occur and he agices to pay you 51 every time Bome othei 
lesult occuis Of course, in an actual game the assumption 
regarding the character of the coins and the method of tossing 
would have to be checked by examination of the coins and by 
trial tossings This is an illustration of the multiplication 
theorem for independent probabilities 
Another gambling game coo^sts of the throning of two dice 
You agice to pay your opponent a piedetermined sum when- 
ever a combination totaling 7 appears, and he agrees to pay 
you a predetermined sum whenever another lesult appears 
Again the pioblem is to detemune fair odds This may be done 
by a combined application of the multiplication and addition 
theoiems 



PROBABILITY CjiLCULUS 


275 


Assume again that the dice are of such a character and are so 
thrown that all faces tend to turn up in equal proportions. The 
probability of any given result for each die is therefore The 
six possible combinations that add up to 7 are (1,6), (2,5), (3,4), 
(4,3), (5,2), and (6,1). If it is as.sumed that the dice are thrown 
so as to give independent results, then, by the multiplication 
theorem for independent probabilities, the probability of each 
one of the above combinations is (i)(- 0 -) = -gV- Any one of the 
combinations, however, will yield a total of 7. The probability 
of a total of 7 is therefore the probabilit 3 ’' of any one of these 
combinations, which, by the addition theorem, is 

(■sV) + (?V) + (g^^o) + (gV) + (gV) + (gV) = gg = g 

Hence, fair odds are 5 to 1 ; that is, the game ■will be fair if you 
pa 3 '' your opponent $5 every time a 7 occurs and he pays you $1 
every time some other total occurs. Again, in a real game the 
character of the dice and the method of thro^ving should be 
checked to see if the above assumptions are warranted. 

Consider still a tliird gambling game. Suppose that two cards 
are drawn at random from a well-shuffled pack, the suit is noted, 
the cards are returned to the pack, the latter is shuffled, and 
the whole operation is repeated. Each time the cards are all 
spades you agree to pay your opponent a predetermined sum; 
if they are otherwise, he pays you. What are fair odds? 

Assume as" in the other games that the method of dramng 
cai’ds and the method of shuffling are such that all cards tend 
to be drawn in equal proportion. The probability of a spade 
among the first cards dra^vn will thus be gf, assuming the usual 
deck of 13 spades, 13 hearts, 13 diamonds, and 13 clubs. If 
the first card di'awn is a spade, the remainder of the deck con- 
tains 12 spades and 13 of each of the other suits. If in a large 
set of drawings each of these remaining cards tends to turn up 
in the same proportion as every other card, then the probabihty 
of a spade among the second cards drawn will be ■^. This is 
the probability of a spade on the secopd draw, assuming a spade 
on the first draw. Then, according to the multiplication theorem 
for dependent "probabilities, the probabilit 3 '^ of a spade on both 
the first and second draws is (H)(if) = -h- The odds will be 
fair, therefore, if you pay $48 every time two spades are drawn 
and your opponent pays S3 every time an 3 '' other combination 



270 11 Ih \On'\I \L hRLQVL\C\ ClUVl 


i& diawu Again the aasiimptioas notild ha%e to be checked m a 
leal game 

£^jamp/cs Iniohing Continuous Dislnbiilxons jiil the exam 
pies so far ba^ e been concerned with discicte probabilitj distnbu 
tions Much of the piactical work m statistics ho\%e\er is 
concerned uith continuous distiibutions the fiist example of 
this kind considei the distribution of heights of eighteen j ear old 
1)0) s The fitting of a iioimal curve to the heights of 300 
eighteen j ear old Princeton freahmen‘ suggests that in general 
the forces of nature aie such as to cause a normal ’ distnbution 
of heights If this is assumed to be the case then the normal 
cuia e can be eraploj ed to calculate tiie probability of an eighteen 
iear-o!d boy having a height b*ng betivcen any given range 
This la done as follows 

Vs indicated above * if the distribution of probabihtj follows 
the normal law then the probability of an attribute ranging from 
x/a to a?/ff + d{xls) IS given b\ the formula 


_ 1 _ 

\/2x 


" 0 ) 


where x/a icpre»ents a deviation of the attribute from tho mean 
attribute measured m 9 units o is the standard deviation of tho 
distnbution and d{x/o) is an intuutesimally small range This 
repiesonts approximately the aiea under the curve for the mfim 
tesimal range xju to xjo 4- d(ar/<r) \ finite range running &av 
fromxj/<r toTj/v canbeconceivedasmadeupof a number of mfim 
tesimal miges of size rf(x/«r) and the piobabilitv of an attnlnite 
ranging fiom xi/<r toxj/v is (by the addition thtoiciu) merelj the 
‘>um of the piobabihtiesforeach of the^ mfimte iinal ranges ti 2 





or in tho notation of the integral calculus, 




' See pp 29o 30b a I rsspec aUv Fir 101 
> See pp 2C4 267 



PROBABILITY CALCULUS 


277 


111 other words, the probabilitj'’ of an attribute ranguig from 
Xi/(T to Xi/ (T is simply the area under the cuiwe for this range. 
This is gi-aphically shown in Fig. 9-1 and is a direct result of the 
addition theorem. 

The area under the normal cuiwe for any given range, might, as 
indicated, be found bj^ evaluating the “integral” 


X* 



This is not an easy task, however, even for those who understand 
advanced mathematics. Consequently, tables have been pre- 



Fig. 94. — Illustration of computation of probability of an xf<r lying between 

xi/<r and xt/tx. 

pared that give the approximate areas under the normal curve 
for certain specified ranges and that permit by simple aiithmetical 
operations the calculation of areas for all other ranges. Such a 
table is Table VI of the Appendix, page 693. This gives the 
proportionate area under the positive half of the normal curve 
from the mean (x/cr = 0) to various selected points. Thus from 
the table it is seen that the proportion of the area lying under the 
normal cuiwe from x/tr = 0 to x/tr = 0.2 is 0.07926. 

In addition, since the proportion of the area under the normal 
curve from x/tr = 0 (the mean) to infinity is 0.50000, the pro- 
portion of area under the curve from any selected point to infinity 
can be readily calculated. Thus the proportion of the area under 
the curve from x/tr = 0.2 to infinity is'0.4207-t {i.e., 0.50000 — 
0.07926), the proportion of area from x/tr = 1.96 to infinity h 
0.02500 {i.e., 0.50000 - 0.47500), etc. Owing to the s.tTnmetry 
of the curve, the same values hold true for areas from x/tr = 0 to 



278 


Tin \ointu FttiQUi-\cy cujnj'’ 


zl<t « — » Thus the proportion of the area for the range from 
xja = -0 2 to z/<f = - w 18 0 50000 - 0 07920 = 0 42074 
To find proportionate areas for other ranges, it is necessary 
merely to add or subtract proportionate areas guen directly by 
f he table Thus, the proportion of area h om the range x/a = 02 
to xja = 0 3 IS the difference between the proportionate area 
from x/<r = 0 3 to the mean and the proportionate area from 
x/<r *= 0 2to the mean t e , 0 11791 — 0 07920 =* 0 03865 I ikc 
mse, the proportionate area under the curve for tlie range 
i/<r = — 02 to x/c- = -f0 3 is simpl> the sum of the propor- 
tionate area from x/«r = —0 2 to x/<r =» 0 and the proportionate 
area fiomx/o- = 0 tox/ff = 0 3, i e , 0 07920 0 11791 = 0 19717 

Proportionate areas for obscure points not given directly or 
uidnoctJy by the table may be obtained bj interpolation, usually, 
straight-lme mteipolation (*c, the calculation of simple pro- 
portionate diffeienceg) gives satisfactory results 
To make use of Table VI in i given problem it is jnoioly nece»- 
sary to convert the original measurements into deviations from 
the moan ovprcsscd m « units t c , to conv ert original units into 
a units The mean height of cighteen-year-old boys for c'camplo 
(as estimated from the heights of eighteen ycar-old Pnuceton 
freahmen of the class of 1943), is 70 47 inches, and the standard 
deviation of heights is 2 49 inches Hence the probability of m 
eight een-ycar-old boy 72 to 73 inches tall is given bj the area 
under the normal curve from 


This, in accordance with the method outlined in the prevnous 
paragraph for calculating such an area isO II707 Similarly, the 
probability of a boy taller than 71 inches ’s given by the area 

luider the normal curve from ^ 2 ' ^" = 1 -12 to infinity, 

which the table shows to be 0 50000 — 0 42220 =« 0 07780 
Again, the probabdity that two bojs picked at random should be 
taller than 74 inches is the product of the two individual pioba- 
bilitics (the multiplication theorem for independent proba- 
bilities) or 

(0 077S0)(0 07780) « 000605 

Tabic VI thus readily facilitates the calculation of probabilities 
whenever the primary distnbuUon or distributions follow the 



CHAPTER XI 


SYMMETRICAL BINOMIAL DISTRIBUTION AND THE 
NORMAL CURVE 

INTRODUCTION 

The preceding chapters have been concerned with probability 
and the probability calculus. These were discussed for the 
purpose of providing tools for subsequent analysis. In this 
chapter the tools will be employed in developing a theoretical 
explanation of the normal frequency curve. The line of attack 
will be as follows. 

The argument will begin with an abstract study of a simple 
problem in combinatorial analysis. The basic data will be 10 
coins, each of which has two sides. These sides will be marked 
■with a head or a tail, and each coin will have one head and one 
tail. 

The combinatorial problem will be the determination of the 
relative frequencies or probabilities of various types of combi- 
nations in the whole set of combinations that might be made 
from various arrangements of the given set of coins. Thus 
the theoretical problem ivill be the determination of the relative 
frequencies or probabilities of combinations having 0, 1, 2, . . . , 
10 heads in the whole set of combinations that might be con- 
structed from various arrangements of the 10 coins. 

In the terminology of probability, this combinatorial problem 
consists of the derivation of a certain second-order probability 
set from the elementary probability set. To put this in another 
way, the problem is to find the type of frequenc}^ or probability 
distribution that results from the combination of certain elemen- 
tary frequency or probability distributions. Attention will in 
particular center upon the foim of the derived frequency or 
probability distribution. Exact and approximate formulas will 
bo determined for this distribution. 

The purely th eoretical part of the theory of the normal cuiwe 
Avill thus be a set of problems in the pr obabilit y calculus. What 
279 ■“ ^ 



280 


THb NORAfAL FREQUENCY CURVb 


r is ^??lf,?mate b — hmvRver, the explanation that this 
distnbution. affords of some of the frequency distnbutioas that 
appear lUleai lile, such as the frequ^ncy^lislnButions of the 
heights of adult white males, the frequency distnbution of 
samples from a given population, and the like This explana- 
tion Mill be undei taken after the completion of the combinatorial 
y analysis 

SYMMETRICAL BINOMUL DISTRIBUTION 
Derivation As alreadj suggested, the discussion of the theory 
of the normal frequency curve will begin with the analysis of a 
simple problem involving 10 coins Each com, it will be assumed, 
has two sides, one of which is a head, the other a tail Since the 
probability of an object has been defined as its relative fiequency 
in the set of objects to ivhich it belongs, it may be said that for 
each com the probability of a side being a head is J and the 
probability of its being a tad is also i The problem to bo 
consideied is this If the 10 coins are combined m all possible 
wojfc, the selection of a head or a tad for any one com being 
independent of the selection for other coins, what aie the various 
types of combinations of heads and tads that will be produced 
and what will be the piobabdity of each type m the set of all 
possible combinations? This is a straightforward problem m the 
theory of combinations and may be solved as follows 
To facilitate the analysis let the 10 coins be distinguished 
by the letters 4 B, C D, B, F, G, JI, I and J A combination 
having 0 heads, for example will be represented as follows, 
ABCDFrGUlJ 
TTITITIITT 
a combination having 1 heads as follows, 

A B C D h 1 G U I J 
HIIHIIITTIII 
etc 

Consider fiist the combination having 0 heads Smee the 
probability of a tail on each com is 1, the probability of 0 heads 
13 For the probability of A being a tad is and the same 

IS true for B, C, D, E, F, G, H, I, and J Furthermore, since the 
piobability of a tad for any one com is independent of what 



SYMMETRICAL BINOMIAL DISTRIBUTION 


281 


the other coin.s are, the probability of the above result is, by 
the multiplication theorem, the product of the 10 independent 
probabilitins, or (i) (i) (i) (i) (i) (-1) (i) Q) (a) (i) = (i) Finally, 
it is to be noted that this result can be obtained in only one way. 
Hence it is to be concluded that the probability of 0 heads is 
1/210 = 1/1,024. 

Consider ne.xt the following combination: 

A B C D E F G II I J 

H T T T T T T T T T 


This is a case of 1 head. Since the probability of A being 
a head is ^ and the probability of each of the other coins being a 
tail is also 4 and since each of these results is independent of the 
others, it follows that the probability of thi.s pai-ticular com- 
bination of heads and tails is again (4)*“. But there are also 
other combinations having only 1 head. Such are 


A B 

T H 

T T 

T T 


C D 

T T 

H T 

T T 


E F 

T T 
T T 
T T 


G H 

T T 

T T 

T T 


I J 

T T 

T T 

T H 


In fact, it is readily seen that there are 10 combinations 
altogether in each of which a different coin is the one being a 
head. The probability, therefore, of any one of these 10 com- 
binations, i.e., the probability of a head on some one and only 
one of the 10 coins is, by the addition theorem, 10(^)^“ = 10/1,024. 

Consider now the combination 


A B C D E F G H I J 

H H T T T T T T T T 


This is a case of 2 heads. Since the probability of A being a 
head is the probability of B being a- head is 4 and the prob- 
ability of each of the other coins being a tail is likewise ; and 
since each of these results is independent of all the others, it 
follows once more that the probability of this particular com- 
bination is 

But, as previously, this is not the only combination having 
2 heads. The reader himself udll be able to write down a number 
of other combinations in which onlj^ 2 heads appear. The 
question is how many different combinations of the 10 coins 
have 2 and only 2 heads? This is answered by the theory of 



282 


mt \OHSt \,I tRKQVt\C\ cunvi 


peiTOUtations aud combmaUoua outlined m Chap Tliiw 
the number of difTercnt combinations of 10 coins taken 2 at a 
time H 




10 ' 

2 ' 8 » 


45 


[Cf Fq (3), pjbi- 234] Ihcro beitiK therefoiL, J5 dilTcrent 
comhiiutions each of which his a prohihihtv of (i)'®, it follows 
that the pro!)ahIhl^ of atii one of tlum is 


45 


0 )" 


1024 


I he piohabihtj of other pos».ii)lc combmatiuns is determined 
in 1 similar manner in general the prolxibiliti of heads ly 

( 10)1 (iY 

{N,)vo-y,y\V 

1 Ims the prohabiliti of 3 heads and 7 tails is 



i ho probabiiitN of 0 heads and 4 tads is 



etc 3 ho results obtained bi use of this formula ma> be tabu 
lated IS folloits 


TaHII 10 — PliOBABIUTlCbOF t AKIOl S ( OWBINATJONs AilOSO ALI. PO'^SIfU f. 
COUBISATIONB >IK 10 CoiNS 
C nmbinations Hat mg Pniliahiliti 

Ohcait 1/1,024 - 000098 

1 heal 10/1,024 - 0 00977 

2 head*. 45/1024 = 001391 

3 head-* 120/1 024 - 0 11719 

4 licatb 210/1 024 - 0 20.A13 

5 heads 2o2/l,024 - 0 24G09 

0 heatb 210/1 024 -0 20^03 

7 heads 120/1 024 = 0 11719 

S hoads 45/1 024 - 0 0439 1 

9 heads 10/1,024 - 0 00977 

10 heads 1/1,024 - 0 00093 

> Si-c pi> 232 234 



SYMMETRICAL BINOMIAL DISTRIBUTION 


283 


It mil be noted that the series of probabilities of 0, 1, 2, . . . , 
10 heads may be obtained by the expansion of This 

distribution of probability is consequently called a “bino mia l” 
distribution.^ If iV coins had been used instead of 10, the 
probabilities of the distribution would have been ^ven bj^ the 
terms of the expansion of (i + Thus the probability of 

a combination having Ni heads among all possible combinations 
of iV coins is- 

( 5 )’ 

or if jYq is .set equal to N — jVi, 


= irak-' (5)' 

This is the general formula for a symmetrical binomial dis- 
tribution. 

Character of the Symmetrical Binomial Distribution. A 
graph of the probabilities given in Table 19 is presented in Fig. 
95. It will be noted from the table and also from the figure that 
the probability of 0 heads is the same as the probability of 10 
heads, that the probability of 1 head is the same as the prob- 
ability of 9 heads, etc. In other words, the distribution of 
probabilities is symmetrical about a central point, in this case 
the point representing 5 heads. This symmetry is the reason 
for the name “symmetrical” binomial distribution. 

Mathematical analysis shows that in general the symmetrical 
binomial distribution has the following characteristics ri 

^lean 

a 

1 cy. p. 234. 

= Ibid. 

" - These formula.s are derived in Smith and Duncan, Sampling Statistics, 
pp. 65-67. 


N 


^4 

= 0 


jV 

4 


= 3 - 




( 2 ) 



284 


THE WOIiM ih hRLQUENCr CUIl\ h 


It wiU be sufficient to check these equations liere by finding tho 
mean, standard dcMation, 0i, and fit of the distribution of 
Table 19 



0J2J456T89»A} 

1 10 OS — Graph «f & sinimelncat Uuomml diatriboUuii 

Ihe mean of a distribution of piobabihty, it will be lecalled/ 
Is equal to the sum of the attributes tmies their probabilities ‘ 
The mean of the distribution of Table 19 is thus 


024 '*■ 1,024 + 1,024 + 1,024 ■*" 1,024 


^ ^ , , 120 
1 09-1 1 nc>4 1 (V>A 


+ MSI 


According to the formula, the mean equals iV/2 = ^ = 5, which 
IS the same value as that derived aboac bj direct calculation 
Smiilailj, the variance of a distiibutioii of piobabihty is equal 
to the sum of the deviations flora the mean squared and multi 


* See p 169 



SY.U.yETltlCAL HINOMIAL DiETUlliUTlON 


285 


plied by their probabilities. Jleuce, the variance of the dis- 
triI)utiou of Table 19 i.s 


1,024 ^ 1,024 ^ **■ 1,024 '' 

, 210 , , 252 , 210 , 120 

i>24 ^ 1,024 1,024 1,024 




1,024 'i;024 


This again checks with the formula, which gives a- — iV/4 = 2.5. 

Likewi.'C, the third moment about the mean of a probability 
distribution is the .sum of the deviations from the mean cubed 
and multiplied by their probabilitie.s, and the fourth moment is 
the sum of the deviations from the moan rai.«ed to the fourth 
jjower and multiplied by their probabilities. Thus, for the di.s- 
tribution of Table 19, 


1,024^ ^^'■^1,024 ^ ■^^’■^ 1,024 ^ 

-f ^ — ( — 2)’ + ( — 1)’ + (0)* 4- (l)’> 

^ 1,024 ' '' ^ 1.024 ^ ■’ ^ 1,024 ^ 1,024 ^ 

+ rii + -m <'>' + r® + wa <“>’ - " 

and 


1,024 ^ ■’'^**^1,024 ^ ■*^‘'^ 1,024 ^ 

4. -i^- ( — (— l)‘ + (0)' + — - (1)^ 

‘ 1,024 ^ '' ^ 1,02-1 ^ 1,024 ^ 1,024 '' ' 


j !JP- (2)> -4- (3)^ 4- (4)* d (5)‘ 

^ 1,024 ^ ^ 1,024 ' ^ 1,024 ^ ^ 1,024 ^ 


17.5 


Since, by definition, /?i = and /?: == f‘4/M2> it follows that for 
thi.s distribution /Si is zero and /S* = (2~^“ ~ 


values again given by the general formulas. Thc.se formulas are 
valid for all .symmetrical binomial distributions. 

The Normal Curve, If 40 instead of 10 coins were involved, 
the distribution of probability would be considerably more 
spread out than that of Table 19. This is readily seen from 



28G 


TUh \OU\l \L FRKiUhhCY CVHVE 


Fig 90 In general, the formula c — y/Jffi indicates tliat the 
dispersion of the distribution increases m proportion to 
If the honzontal scale is reduced, ho\\e\ er, and the \ ertical scale 
enlarged, m the same proportion that the di'spersion of the dis- 
tnbution is mcieased, then the effect of increasing N is to bring 
the ordinates of the disliibution closer together and to laise them 



fi q 10 ll It 1} U IS » 17 18 n TO tl Z3242S 26 27 M W JO 31 32 
0123456769 10 

rio 00 — Graphs of two sjmmetrical binomial distributions one for \ - 10 
the other for tt -‘■Vi 

to the height of the oiiginal distnbution Uudei these condi- 
tions the tops of the ordinates tend to sketch out a smooth curie 
as iV IS increased Tlus is indicated m Fig 97 It can be shown 
that the hnut that the aynimctncal bmoanal distnbution 
approaches as iV is, increased, while at the same time the scales 
ire adju»ted in proportion to ■\/N, is the noinial cune 



That the sjmmetrital binomial distnbution appioaches the 
normal cune is a limit can be definitely proved rigorous 



287 


SYM M liTRlCAL lUNOMIAL DISTRIBUTION 

mathematical analj'sis. * Certain general considerations, however, 
suggest this same conclusion. 

1. 1 he distributions of Figs. 95 to 97 have a shape similar to 
that of the normal curve; and if a normal curve with the same 



Kiel. 97. — Illu-anitioii of offect of scale a<Ijiibtineuts on a .-yniiuoUical binomiiil 

ilistrilmtion. 

mean as any one of these distributions and the same standard 
deviation is graphed together with that distribution, the curve 
i.> .seen to be a good “fit.” This is shown- in Fig. 98. 

' This is clenionstrated in Smith and Duiicuu,6'a/a;;h'/i£/5fafiif ICS, pp. 68-71. 

■ The binomial di.stribution ib a di.scrcte distribution, and its probabilitic.s 
are correctly represented by a series of ordiautea as in Figs. 90 and 97. It 
i.s the ordinates of the normal curve of Fig. 98 at \/a, 2/a, etc., and not 
hcetion.s of the curve area tliafapproxim.ate the binomial ordinates at these 
points. As pointed out, however, in Smith and Duncan, Sampling Stalislics, 
j). 71, it is po.ssiblc to reprc.sent any symmetrical binomial distribution by a 
iii.stogram whose area i.s appro.ximatcd by that of a normal cun'e. In this 
w av the area tables of the normal curve can be used to appro,ximate a series 
of binomial probabilitio.s. 





SYMMETRICAL BIXOMIAL DISTRIBUTIOX 


289 


2. Equations (2)’^’ show that /3i = 0 for the symmetrical 
binomial distribution and that /Sj approaches the value 3 as iV 
is increased. These are also the values of /3i and /Ss for the 
normal curve. 

3. If a graph is made of the sj'mmetrical binomial distribution 
in the form of a frequency polj^gon, the relative slope of any side 
of this polygon at its mid-point is the same as the relative slope 
of a normal curve at that point. Figure 99 shows, for example, 
that for N = IQ the ordinate of the symmetrical binomial at 
Ni = 6 is equal to 210/1,024 and the ordinate at Ni = 7 is 
120/1,024. The mid-point between 6 and 7 is 6.5, and the 
ordinate of the polygon at that point is 


1/210 120 \ 165 

2 Vl,024 1,024/ 1,024 

The absolute slope of the polygon at this mid-point is given by 
the ratio of the difference between the ordinates at 7 and 6 (that 

is = ~ ) to the distance between the abscissa 

1,024 1,024 1,024/ 

points 0 and 7 (that is, 7 — 6 = 1) ; and the relative slope at the 
mid-point is given by the ratio of the absolute slope to the ordi- 
nate at that point. Thus, the relative slope of the polygon of 
Fig. 99 at the mid-point 6.5 is 


90 

1,024 


165 

1,024 


90 

165 


-0.545 


In general, the relative slope of a symmetrical binomial distribu- 

1 . + 

tion at any mid-point Ni. + 2 is equaF to — 


IN 

If X is set equal to N i + 2 ~ ^2’ derdation of the 

N . . . 1 j. “i" 

mid-point from the mean - 2 ' and if 2k- is set equal to 2 ’ 

* Page 283. 

1 See Smith and Dunc.i.v, Sampling Statistics, pp. 74-76. 



H// \OHM iL FHI-UUby!C\ ClJltVt 


JJO 


this expression for the relatuc slope at point J^becomta — ^^or 
— ^ But it can bp bhown* bv the differential calculus that the 

relative slope of the normal curve at an} point x is ^x/a* where 
j = \ — \ Hence the relative slope of the sjmnictncal 
binomial distribution at aii> mid point ts tlie 'vanit. as that of a 

normal cun c whose standard deviation is equal to A, = 
wluch, if N IS large is practicall} the same as the btandanl 
dev lation of the sj mraetneal binomial dtstnbution * 

CONDITIONS PRODOCING THE SYMMETRICAL BINOMIAL AND 
THE NORMAL CDRVE IN REAL LIFE 

' The foregoing sect ions hav c been <Jc\ otc d to the denv ation ami 
dcscnption of a particular frcqucnc> disUibution known as tin, 
binomial distribution Ihe analysis lias consisted entirely of an 
application of the probabilit} calculus, and the result is an 
abstract distnbution of piobabihty Since the ultimate purpo o 
of the analysis is an explanation of bomc of the frcquenc} di^tn 
buttons of real lifej it is desirable at this point to conMiIcr the 
question What is tho relationslup between tho sjmmetncal 
binomial distribution and a frequency distribution of real hfcf 

ConMder first the following hypothetical experiment Suppose 
that the 10 coins referred to m the theoretical discussion arc 
tossed a large number of times and the rclativ e frequencies watli 
which they come up 0, I, 2 ,10 heads arc computed 

What wall be the results? Actually, no preciso prediction can bo 
made Intuition suggests, however that if the coins arc 8}m 
metneal and are tossed in an unbiased fashion, the relative 
frequencies with which the combinations 0, 1, 2, ,10 heads 

will appear will be clcsc to tho probabihtics of these combinations 
among the whole set of combinations that could bo made from 
10 coins For if coins are tossed at random, it is to be expected 
that a liead will appear on any one com as often as a tail Tho 
randomncas also ensures tliat tho appearance of a head on one 
com w ill be independent of the appearance of a head or a tail on 
an} other com Lnder these conditions it would seem likely 

'Ibid 

» See Djs (2), p 283 



SYMMETRICAL BINOMIAL DISTRIBUTION 291 

that any particular arrangement of heads and tails would occur 
just as often as any other arrangement. Therefore, the relative 
number of times 3 heads and 7 tails would appear, for example, 
Avould be equal to the relathm number of arrangements that 
would jdeld 3 heads and 7 tails out of the set of all possible 
arrangements. This is the relative frequency of the binomial 
distribution. Intuition thus suggests the results of random 
coin tossing will be closely approximated by the binomial fre- 
quencies. Actual experiments lend .support to this argument, 
so that it would seem possible to predict the results of a large 
number of tossings by the use of the probability calculus. This 
is merely an application of the law^ of large numbers. 

The relationship between the results of coin tossing and the 
binomial probabilities suggests even more important inferences. 
For there may be conditions in real life that are similar to those 
involved in the tossing of coins, and statistical variables produced 
by these conditions may be expected to follow' the symmetrical 
binomial distribution and in special instances the normal curve. 
To illustrate the conditions that might give rise to such results 
consider the follomng examples; 

Example 1. Suppose that the sex of the offspring of a certain animal is 
determined by the type of the egg cell in the female that unites with the 
sperm cell of the male, and suppose that the number of egg cells in 
the female that will produce male offspring is on the average equal to the 
number of egg cells that will produce female offspring. If sperm cells unite 
a'ith egg cells in a random manner, the chance is 1 of an offspring being a 
male and J of its being a female. These are essentially the same conditions 
that determine whether a symmetrical coin should turn up heads or tails 
rt-hen tossed at random. Under such conditions the frequency distribution 
nf the number of males in families of a given size should theoretically follow 
the symmetrical binomial distribution. Thus families of size 5 should be 
3.\'pected to vary in sex combination as follows: 


Number 

Percentage of Families Having 

of Males 

Specified Number of Males 

0 

A = 0.03 

1 

h = 0.16 

2 

•2 = 0.31 

3 

1? = 0.31 

4 

A = 0.16, 

5 

A = 0.03 


.4 study of the .sex of pigs in 1 16 litters of 5 pigs each showed the following: 



202 


TIIL \onMAL FRFQVLi\C'^ CUIiVI' 


Number 
of Maks 
0 
1 
2 
3 


Pcrctntago of I liters IlaMng 
Specified Number of Males 
0 02 
0 17 
0 3u 
0 30 
0 12 
0 03 


llic riobcncss of Ihetsc figures to (hose aboic suRgests tint the theory of 
sex detcriniiiation outlined aboxo itiiRht \tr> well be xalid for pi^s 

txamph^ In 1 xainplc 1 Londttions ucrc such as to produce a x unable 
(number of males) that Mas Uisinto nod integral llic present hjpothctical 
■ xainptc Mil] suggest ronditions winch might produce a xanahlc thit was 
ilisercte but nut integral and that nus distributed m the form of a s^tii* 
metrical binoinial It aUo migginU conditions under which the xanablc 
might be prncticullv continuous and distributed like a noniiat curxi 
Suppose (lint there are a firge number of bnp* of flour, saj lOfXW, inch 
wughing exnctlx 3 pounds bupiwsc that an cxpcnimntrr opens luih bag 
in suctcsaioii and adds or subtracts a certain <)uantit> of flour to eaih bag 
in accordance with the following rule Whenextr ho 0 }>cii 8 a big, he also 
looses 10 eotos (or <. tell head (hat appears ho adds an ounce of flour to the 
bag for lach tail he siibtrucU* an ounce Tlie nsult of this procedure \nll 
be 10000 bags of flour xarxing in wiight from 5 pounds - 10 uuiitcs to 
3 pounds + 10 outlets the unit difTirenec being 2 ounces In acconlance 
with the furcgoing analxsis the dis(nbution of the wiighU of these bags of 
flour xxould bo approuniatcly as follows 


\\ eight of liag 
1 lb 0 oz 
4 lb 8 oz 
4 lb 10 oz 
41b 12 ct 

4 lb 14 oz 
3 lb 0 oz 
3 lb 2 oz 
3 lb 4 oz 

5 lb 0 oz 
5 lb S oz 
3 lb 10 oz 


Itilatixe hnqurney ofOccurruiic 
1/1 021 
!0, 1 021 
13 1,021 
120-'! 021 
210/1,024 
2o2/l,021 
210/1,021 
120/1,021 
1V1,021 
10/1,024 
1 1,021 


fn otfier words, the distribution of xxeights xxoufd approMiiiatdy ronforin 
tq a sj mmetneal binomial distnbuUon with a me in xx eight of 3 pounds and a 
standard deviation of 2 5 X 2 >■ 5 ounces 
This shows how a xannble may be produced that is discrete but nut 
integral and that is distnbutcd in the form pf a s> mmetru d bmumial 
distribution To produce a vanablc that la practically continuous, it is 
nectssarj (o increase (he number of coins front 10 to 1(X), say, and to reduce 
the amount of flour addcil or subtractid to 0 01 ounce Diffircnces as 



SYMMETRICAL BINOMIAL DISTRIBUTION 


293 


small as 0.02 ounce would thins be possible, and for all practical purposes 
the \yeight of a bag of flour could be said to be continuous. Under these 
conditions a graph of the distributions of weights would be practicallv con- 
tinuous and as indicated in the theoretical 


discussion would have the form of a normal 
frequency curve. 

Example 3. Example 2 was entirely 
hj'pothetical. An apparatus has been 
constructed, however (sec Fig. 100), that 
reproduces in somewhat different form the 
conditions of Example 2. By its use the 
results predicted in Example 2 can be 
concretely illustrated. 

The apparatus of Fig. 100 was devised 
many years ago by Sir Francis Galton and 
sukscqucntly elaborated by Karl Pearson.* 
As reprc.scnted in Fig. 100 it consists of a 
scries of rows of wedges, each row contain- 
ing an additional wedge and so arranged 
that its wedges come halfway between the 
wedges of tlic row above. If the wedges 
are placed 1 centimeter apart, then a small 
ball dropped into the top of the machine 
will have an equal chance in each row of 
being deflected 0.5 centimeter either to the 
left or the right. The apparatus of Fig. 
100 has 10 rows. The final deviation of 



Cm 


the ball from the central point 0 will thus pio. lOO.— The Poarson-Gal- 


be the algebraic sum of the left (minus) ton apparatus for physical 
and right (plus) deflections as it fulls binomial distri- 

through the 10 rows. Tlie possible range 

of this final deviation is from —5 to -fS centimeters. Since the probability 


of a plus and minus deviation of 0.5 centimeter is in each row equal to J 
(similar to the probability of a head and a tail for a coin) and since there 
are 10 rows (as there were 10 coins in the previous case), the probabilities of 
final deviations of —5, —4, —3, —2, —1, 0, -hi, +2, -1-3, -fd, -}-5 centi- 
meters will be the .same us those of the binomial distribution. 


“ AM(10 - Niji Uv 
which are given in Table 19, page 282. 

* Gai.tox, Fk.vncis, Natural Inheritance (Macmillan & Company, Ltd., 
London, 1889), p. 03; Pn.vnsoN, Karp, “Skew Variation in Homogeneous 
Material,” Philosophical Transactions of the Royal Society of London, Series 
A, Vol. 186 (1895), p. 343. Pearson’s contribution was to replace the set of 
nails used liy Galton by a set oL sliding wedges that could be so adjusted 
that the chances of deflection to the left and right were not equal. Figure 
100 follows the pattern of Galton’s apparatus. 



294 rilL' .\OH\fAL FREQOCWV CbilVE 

These are the theoretieal probabilities of the apparatus If a large 
number of balls aro aetuallj dropped into the machine, the exact result 
cannot be predicted Intuition suggests, however, that the relatno fre- 
quencies with w bich the balls will pile up in the different slots w Ul tend to 
approximate the theoretical probabilities and this is demonstrated by actual 
experiments Such a result is pictured in Fig 100 bj the shading of the 
slots in proportion to the binomial probabilities 

It will be noted that iii this case the variable, that is, the final deviation 
of a bail from the central point O is again discrete Deviations of integral 
tentimeters only are possible If, however, the number of rows were 
increased from 10 to 1 000, saj, and if at the same time the wedges were 
reduced to 001 centimeter in wre and placed so that they were only 001 
centimeter apart (the balls would, of course, have to be correspondingly 
reduced in size), then the final deviations would vary by 0 01 centimeter 
and might be practicallj considered a contmuous vanable The distri- 
bution of relative frequencies would in this case closely approximate a 
smooth frequenej curve, which would unco again be the normal curve. 

Theory of Errort Errors m physical measurements may be broken up 
into several components (I) Tlie “instrumental error” may bo attnbuted 
to the particular instrumcut with which the measurement is made, every 
measurement by it w ill contain a certain error that may be assigned to that 
instrument (2) Ihe ‘personal error” may be attributed to the particular 
person undertaking the measurement, every observation by Inm will bo 
influenced by his “personal equation ” (3) Another component error may 
be Attributed to particular eMcrital eondilions such as the temperature, 
sunlight, and wind These errors due to the instrument, the obscrv cr, and 
specific external conditions are all “systematic errors” that can be allowed 
for (4) A final component error is the “incidental error,” or "residual 
error,” to which no definite cause can be assigned Such errors are the 
roiiult of the whole host of chance forces, tho same sort of forces that deter- 
mine whether an unbiased com comes up heads or tails The total acci- 
dental error in anv individual measurement may be taken to be the sum 
of a number of small accidental errors arising from different causes v Slight 
irregular changes in external coodiftons, such as the vibration on account of 
air currents or irregular changes in the personal equation of thq observer, 
are evaniplca of causes for accidental error of measureaicat If it is pos- 
sible to discover the law of action of any error, it is thereby' removed from 
the class of accidental errors to the el iss of sy stematic errors 

If the number of forces affecting the residual errors in any senes of 
measurements is large, if oath cau^ a very' small plus or minus deviation 
from the true value, and if the prohabilitv* of a plus and a minus deviation 
IS for each force equal to t, then, as m the cose of the flour-bag experiment 
and tho Pearboii-Galton apparatus, the final residual errors of the senes of 
measurements will tend to be distributed m iccurdaucu with the normal 
curve 'ihc mean of this curve will be Iho true value (after allow ante h is 

'fwellKCVT, Divio, 'J he CombinaltOHM ef OUtentitwHS, pp ^-4 



SYMMETRICAL BINOMIAL IMSTRIBUTION 


295 


been made, of courbe, for the systematic errors mentioned above), and the 
standard deviation of the curve will be an index of the precision of measure- 
ment. This is the theory of errors.* It is supported by the close agreement 
between the normal curve and distributions of actual measurements, bi 
fact, the normal curve is often spoken of as the “error curve” or the “Gaus- 
sian error curve,” after the man who was among the first to recognize the 
possibility of applying the theory of probability to the investigation of the 
errors of measurement. - 

Summary of Conditions Leading to the Symmetrical Binomial 
Distribution and the Normal Curve. The foregoing examples 
suggest that whenever the following conditions exist in real 
life, the data generated by these conditions W'ill tend to be dis- 
tributed in the form of a symmetrical binomial distribution and, 
if certain other conditions are also present, in the foim of a 
normal curve. The conditions giving rise to the symmetrical 
binomial distribution may be stated as follows; 

1. In the absence of certain “causes” of variation or in the 
event of a perfect balancing of their effects, the data assume a 
fixed central value. (The 5 pounds of the flour illustration, the 
“true value” in a series of measurements.) 

2. Deviations from this central value result from certain 
“causes” of variation, the effect of any “cause” being either to 
add a fixed quantity to the data or to subtract the same quantity. 
(To add or subtract 1 ounce of flour or to add or subtract an 
“error” of 0.5 centimeter.) 

3. A “cause” of variation tends to produce positive effects and 
negative effects in equal proportion, that is, P(+) = P{—) — h 
(The probability of a head equals the probability of a tail; the 
probability of a positive error equals the probability of a negative 
error.) 

4. The effects of all contributoiy causes of vaiiation are of 
equal magnitude. (Each adds or subtracts 1 ounce of flour 
or 0.5 centimeter.) 

* .Actually this is a special case of a more general statement of the theory'. 
As pointed out in Smith and Duncan, Sampling Statistics, T). 97, each force 
may cause deviations of varying size with varying probabilities and the 
final residual errors will still tend to be normally distributed provided that 
the number of forces is very large and the relative importance of each is 
about the same. 

= ror Gauss’s fundamental works see Abhandlungen zur .Methode der 
kleinstcn Quadrate ( \. Borsch and P. Simon, Berlin, 1887). 



290 JJIJ SOieVil JJlJQVLSCi CUJIM 

5 Ihe co/jfJibiitor> tAU‘5cs aie independent in their iittion 
lu other ^\ords the contribution of a positive or negative effect 
by any causal factor i& independent of the previous contributions 
of other causal factors 

G The total deviation of any element from its central value 
IS the algebraic sum of the positive and negative contributions of 
the mdivudual causal factois (Ibc total amount of flour added 
or subtracted from a bag is the sum of the ounces added for eacli 
head toased minus the ounces subtracted for each tail tossed ) 

If in addition to these conditions the following also exist, then 
the resulting distribution will tend to conform to the normal 
curv e 

7 Tlie number of eontnbutorj causes is v crj large ( V 
laige number of coma aio tossed, the bmoraial machme cent ms 
■V large numbei of row* ) 

S The positive and negative contributions of each cause aic 
very small (If 0 01 ounce is added or subtracted instead of 1 
ounce, if 0 005 centimeter, instead of 0 5 centimeter ) 

It 18 to bo noted that so far as the normal curve is concerned 
not all these conditions are necessary for its generation The 
above conditions will produce it but the normal curve may aKo 
occur when some of these conditions are absent ‘ It may be 
stated here that the normal cui-ve wdl still be produced if eon 
ditions 2 and 3 are relaxed so that a causal factor may affect the 
data m varving degree and with varjing probabilities and also if 
condition 4 is onij approximately and not exactlj true * The 
most important conditions are 6 to 8 and condition 4 m an 
approximate form For example in the case of the flour illus 
tration the resulting weights of the bags of flours would still tend 
to be normally distnbuted even if biased dice instead of unbiased 
coins were used and if the amount of flour added or subtracted 
xaned with the result of the tlirovv (say 0 001 ounce for the 
occurrence of a one —0 002 ounce for the occurrence of a two 
0 003 ounce foi the occurrence of a three, —0 004 for the occur- 
rence of a four, etc ) provided that the number of dice thrown 
was very large and the amount added or subtracted per die was 
very small and of about the same order of magnitude from die to 

' See Smith and Dlncvv Sampling Statist ct pp 97-100 

’ I nder certain conditions the requirement of u dependence (condition 5) 
iiiav alsobcreiaxed See Smith and Duncan Sampling Stalishcs pp 63-65 



SYMMETRICAL BINOMIAL DISTRIBUTION 297 

die. The normal curve is thus a more general phenomenon than 
the sjTiunetrical binomial distribution.* 

Examples of Normal Frequency Distributions. Natural forces 
appeal- to generate normal frequency diistributions in many fields. 
Physical measurements have already been mentioned. Figure 
101 shows the distribution of heights of 300 eighteen-year-old 
Princeton freshmen. The gi-ades of students on examinations, 
hourh- earnings of workers, the length of life of electric- 
light bulbs, the distance of baseball throws of fii-st-year high- 
school girls are all normally distributed variables. In these 
fields and in many others, it would seem that the conditions of 
variation are those which theoretically give rise to the normal 



Inches 

Fig. 101. — Normal cuire fitted to heights of 300 Princeton freshmen, 

DETERMINATION OF NORMALITY 

Several procedures are available for determining whether the 
population from which a given set of sample data has been taken 
might reasonably be considered to conform to the normal curve. 
In general, these consist of comparing the histogram constructed 
from the sample data nith a normal curve “fitted” to this histo- 
gram. The difference in the various procedures lies in the bases 

1 Matliemutically the normal curve can be derived from a great variety of 
different assumptions. See, for example, Czcbeb, EiiA-XVEL, Theorie der 
Biobichlungufeller (B. G. Teubner, Leipzig, 1891). 



298 THE .\OIt\HL l-RbQUhSCy CUltVE 

of companion Several of the more jmiwrtint procodurcn will 
now be diiicui> 5 >ed 

Graphic Comparison The simplcat method of detcnnuimg 
whether the assumption of normality is or is not rtosonahle 13 
to graph the histogram and normal curve together and see how 
well the cur\c fits Tho test here is purclj a bubjectne one, 
but in many cases when llie fit is exceptionally good or excei>- 
tionally bad this is probably sufficient for acceptance or rejection 
of the hypothesis 

In making a graphic comparison of a sample histogram and a 
normal curve, it is necessary to determine what mean and what 
standard delation should be assigned to the curve Offhand 
the sunplest procedure would appear to be the assignment to 
tho cur\ e of the moan and standard deviation of tho histogram, 
for picsumablj these arc the best estimates that maj be made 
of the mean and standard deviation of the population from which 
the sample was taken '■ It will be recalled, however, that in the 
calculation of the mean and standard deviation of the histogram 
tho data were distributed among various classes or groups and all 
tlio cabca in an^ da^s interval were assumed to be concentrated 
at the mid-pomt of the interval But the population is pre- 
sumably distributed in the form of a smooth curv e, so that, m 
estimating its mean and standaid deviation from tliat ot the 
histogram, allowance must be made for the grouping of the dat i 
in the construction of the histogram In any interv il a smooth 
bell-shaped curve, such as the normal curv e, w ill hav c luoie cases 
that are on the side towaid the mean than on the side away fiom 
tho moan The assumption that all cases are concentrated at 
the mid-point of an mterval will not cause any appiccnble error 
m the mean calculated from giouped data, for plus and minus 
deviations will offset each other, but it will cause the standard 
deviation of the grouped data to be greater than the standaid 
deviation of the smooth curve that repiesents the true distribu- 
tion of the data Some adjustment should therefore bo made m 
tile standard dev lation of the histogram before it is taken as an 
estimate of the standard deviation of the population 
The adjustment that must bo made for grouping has been 
determined by W I Sheppard He has shown that under ton 
ditions that aie true foi a normal distribution the vjnainc 
>C/ pp 318 and 319 



SYMMETRICAL BINOMIAL DISTRIBUTION 


299 


0-2 of the smooth curve Is approximately equal to the variance of 
the grouped, data minus one-twelfth the square of the class inter- 
val.^ In other words, if /io (un corrected) is the second moment 
(= 0-2) of the grouped data about its mean and m is the second 
moment of the smooth curve about its mean, then 

= 111 (uncorrected) — xw{i)" 

The quantity is Sheppard’s correction for grouping that is 

required for estimating the standard deviation of the fitted 
normal curve. 

In fitting a normal curve to a sample histogram, therefore, the 
mean of the curve is taken equal to the mean of the histogram 
and the variance of the cuiwe is taken equal to the variance of the 
histogram minus In plotting the curve a table of the 

ordinates of the standard normal ciu-ve may conveniently be used. 
If the histogram to which the curve is to be fitted is of the usual 
type, that is, if it consists of a series of rectangles of which the 
heights measure aggregate frequencies and if the intervals on 
which these rectangles are erected are laid off in terms of original 
X units, then ordinates of the standard normal curve can be 
taken to I’epresent the particular normal curve desired by making 
certain simple adjustments. The ordinates of the standard 
normal curve, it will be recalled, are given for values of X that 
are measured from the mean of the distribution and are expressed 
in terms of standard deviation units. It will also be recalled that 
the area of the curve over any given interval measures the relative 
frequency of cases falling in this interval. To make these 
ordinates represent a normal curve with a given mean and a 
given standard deviation, they need only be plotted so that the 
ordinate for X = 0 comes at the specified mean value and 
ordinates for other values of X come at X = X + x. To 
put them on the same basis as the histogram, however, they must 
also all be multiplied by Ni/cr. This is because the total area 
of the histogram^ is Ni and that of the standard normal cuiwe is 
1 (that is, 100 per cent), whereas the abscissa scale on which the 
histogram is plotted is a times the abscissa scale of the standard 
curve. This use of the ordinates of the standard normal curve 

1 Cf. Proceedings of the London Matlieinatical Society, Vol. 29, 353-380. 

The area of any one rectangle is Fi, and the total area Is therefore 
"SFi = Ni. 



300 


iilV \OH^fiL FRkQUE\CY CUUVL 


mij be lilu'^trated bj fittmg a normal cui^e to tiie heights of 
300 Princeton freshmen 

In Table 20 the mid-poiots o[ the vanotta cl is& intervals into 
uhich the 300 heights nere distnbuted aie set doivn izi column 
(1) In column (2) the difference betneen these niid-pomts and 
the mean of the distribution (X = 70 47) is computed, and in 
column (3) this is di\ ided by the adjusted standard deviation 
The results are the various \alues o£ x/<r that correspond to the 
mid-points of the vanous class intervals The ordinates of the 
standard normal curve at these values of x/a aie then computed 
from Table VII (see Appendix, page 694) and entered m column 
(4) Fmally, in column (5), these standard ordinates are multi- 
plied bv ~ ■*** to put them on a pai aiith the 

sample histogram 

X* Test oj Good^iess of Ftl Another method of comparing a 
sample histogram with a normal curve is to compare the fre- 
quencies gi%en by the two, interval by interval Wiereas the 
previous method was primarily subjective m that a conclusion 
had to be reached from a mere inspection of the two graphs, com- 
parison of the histogram and the curve, interval by interval, 
yields a numerical criterion of "goodn^s of fit ” A procedure 
that has found favor because it permits a comparison with chance 
results 13 to take the diffeiencc between the absolute frequencies’ 
given by the curve and by the histogram for each interval, square 
these differences, divide each by the fiequency of the curve foi 
that interval, and finally sum the icsults The quantity so 

calculated maj be repiesented by 2_/ — ^ repre- 
sents for each class interval the frequency given by the lustograin 
and / the frequency given by the curve 

Sampling theoiy shows* that, if this quantity is calculated for 
a laige number (theoretically, an infinite numbei) of sample his- 
tograms fiom the same normal population, then the distribution 

of these vanous sample values of ^ wjll be adequately 

lepiesented by a piobabihty curve known as the “x^ curve” and 
this can be used to determine the probability of a larger value of 

‘Tor the curve, this means tho relative frequencies limes N, the total 
number of cases in the sample 

' On the X* distrifaulion see Smith and Duncan, Sampling SlaMtc*, pp 
111-119 n„/l YTTT * 


301 


SYMMETRICAL BIA^OMIAL DISTRIBUTION 


T.^lb 20. Calccl.vtio.v of tub Okdix.vt];.s of -ihb Nohmal Cvuve 
That Fits tub Di.stribctio.n- of Heights of 300 Peixceto.v Freshubn- 


(1) 

(2) 

(.3) 

i 

; (3) 

X 

A' - A = X 

.Y - .Y 

<r 

^ ^ — 

i Ordinate of 
j 'jtanfiard ci!r\e , 

* Col. (4) X — 
<r 

62.5 

-7.97 

-3 22 

i 0 00224 

0.27 

63.5 

I -6.97 

-2.82 

1 0.00748 

0.91 

64.5 

-5.97 

—2 42 

j 0.02134 

2 59 

65.5 

1 -4.97 

-2 01 

! 0.05292 

6 43 

66.5 j 

-3.97 

' -1 .59 

0.11270 

13 69 

67.5 

-2.97 

-1 19 

0.19652 

I 23 87 

68.5 

-1.97 

-0 SO 

0.28969 

35.19 

69.5 

-0.97 

-0.39 • 

0.36973 

44.91 

70.5 

0 03 

-0 01 

0.39892 

48 45 

71.5 

1 03 

0.42 

0.36526 

44 36 

72.5 

2.03 

0 82 

0.28504 

34 62 

73.5 

3 03 

1 22 

0.18954 j 

23.02 

74.5 

4.03 

1 63 

0 10567 I 

12.83 

75.0 

5.03 

2.04 

0.049S0 

6.05 

76.5 

6 03 

2.44 1 

0 02033 

2.47 

77.5 

7.63 

2 84 1 

0.00707 

0.86 


= 70.47 <T (corrected) = 2.47 


V f]l 
^ /■ 


by chance. If the probability is a large one, then 


{F - jy- 

may reasonably be 


the difference between the given sample histogram and the 
normal curve, as measured by 2 / 

attributed to chance; the cui've raa}' be deemed a good fit, and 
the population from which the sample was drawn maj' tenta- 
tively be taken as normal. If the probability is very small, 
however, say less than 0.05, then the difference between the 
histogram and the curve is to be attributed to something else 
than chance, presumablj^ to the nonnormalitj" of the population 
from which the sample was drawn. In this case, the normal 
curve is not deemed a good fit, and the hypothesis of a normal 
population is rejected. Owing to its use of the x* curve, this 
second method of comparison is called the “x' te.st of goodne-^^s 
of fit.” 

The X" test may be illustrated, as in the previous case, by the 
distribution of heights of 300 Princeton freshmen. The numeri- 


SYMMETRICAL HI XOMIAL DISTRIBUTION 


303 



♦ The items in this column are obtained by subtracting froxn 0.60000 the figures found foi each ^i \ — X )/tr and b' adding to 0.60000 the figures 
found for each — X)/<t in Table VI of the Appendix, p. 093. 


‘504 


UlL \OiniAL tHLQUh\CY' CURVE 


the first column is e<iui!to thennmlier of class inten ala minus 3 * 

^ (F ^ f\’ 

'1 he figures m the second column represent \ alues of ^ ’ 

for AS Inch there is a probabilitj of 0 05 that an. equal or greater 
lalue nould be obtained by mere chance For example, m the 
piesent instance, « = 11 — 3 = 8, and Table 22 shons that, if 

^ _ A2 

the data were truh normal, sample \alues for ^ — j-- -- that 
were equal to or greater than 15 51 would be obtained onh 
5 times out of 100 for such a value of « Since llie computed 

(r -f)- 

value of 2^ — j — *= 3 807, the chances of an equal or greater 


TvBir 


2 

3 


C 

7 

& 

Q 

W 


22 — (biticvi Vvices fob ^ 

\ alues of ^ for 

Winch the Probability of on 
I-<jual or Greater Value Is 
Just 00> 

3 Si 

o «9 

r 81 

d 49 

11 07 

12 39 
14 07 

13 31 
16 92 
13 32 


11 

12 

13 

14 

15 

16 

17 

18 

19 

20 


19 67 

21 03 

22 36 

23 63 
23 00 

26 30 

27 i»9 

28 87 

30 14 

31 41 


* Abndtta from T&ble III TabW of x* ui R A Ftaber SlaliitKal iltlMt /or Rticarch 
Workeri Oli\cr & Bo^d Ltd Cdinbargh tbo Viod berrnission of the pubbeben Atui 
author 


* See Smith and Duncan, Sampling Sloiitlics, pp 327-328, for an explana- 
tion of the Bigmhcance of n in this case 



SYMMETRICAL BINOMIAL DISTRIBUTION 


305 


value is much more than 0.05. Hence the curve is deemed a 
good fit, and the distribution of heights may be said to be normal. 

Comparison of Special Statistics. Although the test just out- 
lined is very commonly used, it has certain weaknesses as a test 
of normality. (1) It should be noted that the squaring of the 
differences between the group frequencies removes any signifi- 
cance that might be attributed to the signs of the differences. 
For example, it might happen in a given case that all the histo- 
gram frequencies to the left of the center were larger than the 
normal curve frequencies and that all the histogram frequencie.s 
to the right of the center were less than the normal curve fre- 
quencies, indicating a well-marked positive skewness; neverthe- 
less, if the absolute values of these differences were all small, the 
x“ test might not indicate any departure from normality. (2) 
The necessity of combining the extreme intervals into larger 
groups causes a loss of information and reduces the number of 
points of comparison. 1 For the.se reasons, other methods of test- 
ing for normaUty have been proposed. 

If a set of sample data actually has come from a normal popula- 
tion, it is to be expected that its skewness will be slight and its 
kurtosis close to the normal kurtosis of 3. It Avould also be 
expected that the ratio of its average deviation to its standard 
deviation Avould be somewhere in the neighborhood of the value of 
this ratio for the normal curve {i.e., 0.7979). The departure of 
the actual values of these sample statistics from the theoretical 
values for the normal curve can thus be used as a test for normality. 

For the 300 Princeton freshmen, /3i, and the ratio of average 
deviation to standard deviation (indicated by the symbol a) had 
the values" /Si = 0.023, ^2 = 3.021, and a ~ 0.805. These are 

’ Its practical effect is to reduce the value of n to be used in the table. 

^ Xo account was taken of Sheppard’s correction in computing these 
values. The average deviation used m making this test was computed from 
the mean by the formula 

A.D. = 1 [X 1/’’ (9 I + (J + c^)] f 

where c = — — : — — < Ni = number of cases in intervals below the arbitrary 

origin, N,, = number of eases in intervals above the arbitrary origin, and 
No = number of cases in interval containing arbitrary origin. (A must be 
in the same interval as X.) Cf. Geary, R. C., and E. S. Pear&ox, Tests of 
Normality, p. 4. 



300 3/U \OUl^lAL hlthQVESCi CVltVh 

all veiy close to tlie > alues 0 3, and 0 7979 of a truly normal dis- 
tribution Hence tlus la&t, asAiell as the other tests, suggests that 
heights aio normally distnbutcd 

Sometimes the bamjilc ^ dues of and a aic not so close to 
the normal \aluca as in the foregoing illustration In such 
instances use may be made of tables published m Tests of Normal 
ily,' bj R C Geaij and E S Pearson Ihese tables give, for 
vanous-sized samples, the sample values of )3i, and a, for 
which the probability of a greater value is 0 05, and 0 01, respec 
tivel> For /3j and a tbe 3 also give values of thcbC statistics for 
which the piobabilit^ of a smaller value is 0 05 and 0 01, respec- 
tively If, in anj gu cn instance, the sample v alue of or a 

falls outside the limits given for a probability of 0 05, saj , then it 
may be concluded th it the population fiom which the sample ivas 
draun nas not strictly noimal For the Heights of the 300 
PnncGton ficshmen, for example, the sample v alues of and 
were 0 378 and 4 COO Both these arc beyond the 0 01 probability 
point given by Gcaiy and Pearson s tables for a sample of 300 
(these were 0 329 and 3 79, respectively), and it may therefore be 
concluded that the distribution of weights is definitely not normal 

'Issued by the UiomctriLa OUicc Uiti\cr>ii> Colhge, Londun and 
printed at the University Press Cambridge Lngland 



CHAPTER XII 


USE OF THE NORMAL FREQUENCY CURVE IN SAMPLING 

ANALYSIS 

t The normal frequency curve has its greatest usefulness in the 
heory of random sampling.* AVhile the full exposition of the 
theory of random sampling is beyond the scope of this book, some 
of the simpler aspects that relate to the use of the normal curve 
in sampling analy.sis are presented in the ensuing pages of this 
chapter. 

SAMPLING FROM A TWOFOLD POPULATION 

The Problem. .-Vn elementaiy problem in the theory of sam- 
pling is concerned with sampling from a twofold population. 
Consider the following problem ; Suppose a large city is undergoing 
a fiercely contested election. The Radicals on the one hand and 
the Conservatives on the other are contending hotly for the 
mayoralty, and everyone in the city takes a stand on one side or 
the other. The A'oting population of the city thus forms a group 
in which a certain percentage are Radicals and a complementary 
percentage are Conservatives. Prior to the election these per- 
centages Avill not be known. They may, hoAvever, be estimated 
by taking a random sample. The inferences that may be made 
from such a random .sample constitute the statistical problem 
that Avill now be analyzed. 

Sampling Distribution. For the sake of argument suppose that 
some omniscient being knew how each individual in the city stood 
politically. Suppose that he noted their positions on slips of 
paper — one for each individual — and put the slips into a large 
urn. Suppose, further, that there are actually an equal number 
of Radicals and Conservatives. Let the omniscient being mix the 
slips of paper thoroughly and then draw out a sample of 100 slips." 

* For more elaborate e.xpositioii than is contained in this chapter, see 
Smith and Duncan, Sampling Statistics, Parts II and III. 

- Mundane method.s of obtaining random samples are discus.scd in ibid. 

307 



308 UlL \OHMiLlRhQLh\C\ CVmE 

I/et jum not« the di\ i ion of opinion for tins sample, put llio slip* 
Inrk md thoioughlj mix them igam Finallj , let hmi repeat 
thi'' proccs•^ manj timea, taking a sample of tOO each time, so that 
he c\ entuaJh accumulates a largo numbe: of ^anip]e percentage 
di\ isioiis of opinion Man\ , but bj no means all, of these sarapic 
percentages Mill be the actual population percentage of 50, the 
others, will be distributed above and below the 50 per cent level 
This will be the sampling distnbution of the sample percentages 

It IS one of the important conclusioi^ of the probabihtj theorj, 
based upon the anal>''is of the preceding chapters • that the 
outcome of this process of random sampling will be a ‘set of sam- 
plco m which the relative frequency of samples m which the 
division of opinion is 0 per cent Radical, 10 per cent Radical, 20 
per cent Radical 30 per cent Radical, , 100 per cent Radical 
will bo approNimatelj the same aa the probabilities of a bmoimal 
distribution m which * 0 50 and ** 0 50 and iV =• 100 t In 
other words, relative frequencies of the sample percentages maj 
be estimated a pnon bj means of the probability calculas. 
Furthermoie since the size of the sample is large {N » 100), tho 
calculation of the piobabthties can be simplified by u<iing tlu, 
normal curve as an approximation to the binomial distnbutioa { 
In this problem, the curve will liave a mean of 50 per cent, 
because the population is equally divided between Radicals and 
Conservatives b> hypothesis, and a standard deviation equal to 
5 per cent § The normal cuive, with a mean of 50 per cent 
and a standard dev latiou of 5 per cent thus gives approximatelj 
the ‘ ‘samphng distribution ’ for sample percentages taken from 
1 population m which the division of opinion is exactl} 50 per 
cent, and this is the sampling distribution of sample percentages 
conctived in the preceding paragraph 

The foregoing icsult is not limited, however to cases m which 
the actual division of opinion m the entire population is exactly 

*Sce also tM 

t When the symbol for a sample statistK is m boldface type it refers to 
the corresponding population parameter thus here Pi and pt refer to tho 
population values /or which pt and p^are corresponding sample statistics 

t See pp 2Sa-290 

5 When the v anable la expressed as a percentage instead of as an absolute 
deviation from an integral mean value the formula for tho standarti dcvia 
tion » V(0 5)(0 o)/N C/ p 283 



USE OF THE NOItM.iL FREQUENCY CURVE 309 

fifty-fifty but may be shown to be valid for any division of opinion 
in the population.’ Thus if the percentage of Radicals in the 
population is pi and the percentage of Conservatives is (where 
1^1 + i !>2 = 1) and samples of size iV are drawn at random from this 
population, with replacements as above, then the relative fre- 
quencies of various sample percentages of Radical opinion will be 
given approximate!/ by the probabilities of a normal frequency 
curve whose mean is Npi and whose standard deviation is 
Vpipi/N. 

This conclusion is of capital importance in making inferences 
about a population from which a single random sample has been 
drawn, as will now be demonstrated. 

Statistical Inferences from Samples. Types of I nference. In a 
real instance, no omniscient being is available to record every- 
one’s opinion. Prior to the actual election, the only practical 
way of determining the division of opinion is to take a random 
sample from the population. This may be done by stopping peo- 
ple on the street, ringing doorbells, sending out letters, or the 
like. When the results of the sample poll are counted, they may 
be used to draw inferences about the true division of opinion in the 
population in three ways — that is to say, three tjq^es of inference 
may be drawn. (1) A certain hj’-pothesis regarding the true 
division of opinion may be tested as to its reasonableness in the 
light of the sample results and either rejected or accepted. 
(2) So-called “confidence limits” may be set up for which it may 
be said that there is a given probability that these limits include 
the true value. (3) A best single estimate may be made of the 
population percentage; this is called an “optimum estimate.” 
Each of these three types of inference will now be studied. 

Testing a Hypothesis as to the Population Percentage. Let the 
hypothesis be set up that the population is evenly divided 
between Radical and Conservatives. Suppose the sample poll of 
100 voters shows 57 Radicals and 43 Conservatives. Although 
the sample shows a percentage in favor of the Radicals, it is 
possible, of course, that it may be misleading. Almost any result 
might be yielded by a single sample, whatever the population. 
If the population consisted even of 999,900 Conservatives and 100 
Radicals, it would still be possible for a random sample of 100 to 

‘ For proof of this, .soc- Smitli and Duncan, Sampling Statistics, pp. 
186 - 190 . 



310 Tllh NORMAL tRhQULNCY CUUVh 

consist of ail Radicals Such a result \\ouId not bo \erj proba- 
ble, ho^ve%er, and the reasonableness of any hypothesis must be 
judged by thepiobability of the sample lesiilt on the a&«umpfioii 
that the hypothesis is valid 

'Ihe general procedure for testing the hypothesis is as follows 
First, the risk that is to be allowed in rejecting a given hypothesis 
when it 13 in fact true must be decided upon * The “coefficient of 
nsk,” as it is called, is commonly, but not necessarily, set at 0 05 
In other w ords, it is the common practice to run the nsk of reject- 
ing a hypothesis 5 times out of 100 when it is in fact true When a 
sampling distribution is normal, this is often done by saying that a 



Fiq 102 —Sampling distribution of sample percentage's of tOO votes 

given hypothesis will be rejected if the sample result falls beyond 
±2d from the mean value given by the hypothesis ' In the 
present instance, the hypothesis that the true division of opinion 
13 fifty fifty suggests that random samples of 100 taken from such 
a population will have a mean percentage of Radical votes equal to 
=» 50 per cent and a standard deviation of sample percentages 
equal to VhPi/N = VfO 5)(0 5)/100 « 5 per cent 
Accordingly, 95 per cent of the sample percentages w ould fall 
between 50 per cent ±2X5 per cent, or between 40 and GO per 
cent, 5 per cent of sample percentages would fall below 40 and 
above 60 per cent Hence, if this hypothesis is rejected wlicn a 

'The desirability in some cases of using regions of rejection that fall a'l 
above or all beloiv the mean are discussed in tbtd pp 196-201 



USE OF THE NORMAL FREQUENCY CURVE 311 

single sample return yields a percentage of Radical vote below 
40 or above 60 per cent, then the hypothesis would in many 
sample polls be rejected only 5 per cent of the time when it was 
actually true. In other words, the rejector would be wrong only 
1 out of 20 times in a lai'ge number of tries. 

For the given problem, suppose the coefficient of risk is put at 
0.05. Since the sample return is 57 Radical votes out of a total 
of 100, the hypothesis of an equal division of opinion is not to be 
rejected, for the sample result does not fall in the region of rejec- 
tion below 40 or above 60 per cent. In this instance, the sample 
result does not deviate sufficiently from the hypothetical per- 
centage to cause its rejection. If the sample return had been 
62 Radicals and 38 Conservatives, however, the hypothesis of 
an equal division of opinion would have been rejected and it would 
have been concluded that the Radicals were in the majority. 
This argument and these conclusions are illustrated graphically 
in Fig. 102. 

From the figure it is seen that with a sample result of 57 per 
cent the hypothesis that pi — 0.5 is accepted while \vith a sample 
result of 62 per cent the hypothesis that pi = 0.5 is rejected. 

Determining Confidence Limits for Population Percentage. 
Before confidence limits can be established for a population 
percentage it is first necessary to decide upon the degree of con- 
fidence that is to be placed in the computed limits. This is 
usually determined by so choosing the limits that the probability 
of their including the true percentage equals an agreed-upon 
figure, called the “confidence coefficient.” For example, if the 
confidence coefficient is set at 0.95, as is the common practice, 
then the limits will be so chosen that the probability of their 
embracing the true value is just 0.95. 

In the case of a normal sampling distribution, confidence 
limits with a confidence coefficient of 0.95 may be set up as fol- 
lows: Choose as the upper confidence limit a value for the popula- 
tion percentage that, if it were the true value, would make the 
probability of getting the given sample value or a lower sample 
value just equal to 0.025. Since the sampling distribution is 
normal, this upper limit may be obtained by choosing pi so that 
the sample value of 57 per cent falls at — 2d from the mean value 
of the sample percentage, i. e., at — 2d from pi. The mathematical ■ 
equation becomes 



312 THE \UUM \L tHtQl kSC\ CUHVh 



or, since /ij = 1 — p\i 

0 67 -/..- -2 

^\ htn ^olved for p\, <hLS becomes 



Wien N IS large as it must be if the normal distnbution is to be 
used aa an approximation to the binomial distribution, the terms 
2/jV, A/Ny and 1/A^* can be dropped from the above equation 
without materiall> affecting the lesult In this approximate 
form it becomes 

/ 1 - 0 47 + 2 = 0 67 

In effect, this indicates tliat the upper confidence limit can be 
found approximately by adding to the sample percentage t\nco 
the standard deviation of the sampling distribution, computed 
uith the sample percentage in place of the hypothetical popula- 
tion percentage In general, if is taken as the sample per- 
centage (note that sample statistics are printed in text type and 
the corresponding population parameters m boldface tyTie), the 
upper confidence limit of the population percentage is given by 

/>.=?>,+ 2 fl) 


This IS shou n graphically in lig 103a 

In a similar manner, the lou er confidence limit is gu en appi o\i- 
mately by the formula 


pi = pj - 2 


(2) 



USE OF THE KORMAL FREQUENCY CURVE 313 
For the given instance, in which - 0.57, thi.s lower limit is 


= 0.57 



(0.57) (0.43) 
100 


0.47 


This is shown graphically in Fig. 1031). 

How the upper limit is determined, how the lower limit is 
determined, and the resulting range or total interval between the 
confidence limits are pictured graphically in Figs, 103a, h, and c. 



The limits of the range are 0.47 to 0.67. This is known tech- 
nically as the “confidence interval” and is shown in Fig. 103c. 
Owing to the manner in which the confidence limits were derived, 
it may be said that there is a probability of 0.95 that this con- 
fidence interval includes the true population percentage. By this 
is meant that, if confidence intervals were set up like this from 
manj'’ samples, 95 per cent of them would include the true 
population percentage. 

.4n Optimum Estimate of the Population Percentage. Up to 
this point in the argument, a particular hypothesis regarding the 



314 ini' \OHM iL tR)'QUhy/C\ CHUM 

population hass been tested ind a method of setting up confiduin, 
intervals has been deMsed A final pioblem of statistical 
inference is to indicate a method of making a single best estimate 
of the population percentage from the given sample Vanous 



Values of 


Fio 104— -Diagram allowing relatiooahip betuecn probability of sample and 
likelihood of popuUtiou percentage 

methods are emploj ed, but the one that has recei\ ed consider- 
able prominence m recent years and that wall be emplojed heie 
IS the method of maximum hkelihood 

^Vhen a population percentage is given, the piobabilities of 
various sample lesults may be determined from the sampling 



USE UE THE XOHMAL FREQUENCY CURVE 


315 


distribution of sample percentages, in this case, approximately 
from the normal frequency curve. The analysis hei-e runs from 
a gi^■en population percentage to probabilities of various sample 
results. When a particular sample result is given, however, it 
is possible to detei-mine the probabilities of obtaining this sample 
result from various hypothetical values for the population per- 
centage. Here the analysis runs from a given sample percentage 
to the probabilities of obtaining the particular sample from 
■\'arious hypothetical population percentages. In the latter 
analysis, the logarithm of the probability of the given sample 
result for a particular value of the population percentage is 
called the “likehhood” of the population percentage. 

As shown in Figs. 104a to 104c, these likelihoods vdll vary for 
diffei-ent hypothetical values of the population percentage. The 
\’alue of the population percentage that has the maximum likeli- 
hood is considered the best, or optimum, estimate of the popula- 
tion percentage; this is shown in Fig. 104d. Figures 104a to 
i04c show gi'aphically how the likelihoods of various population 
percentages (or, more exactly, their antilogs) vary with changes 
in the hjqDothetical values for these percentages. These various 
results are summarized in Fig. 104d, which, if completed for a 
large number of hypothetical A-^alues of the population per- 
centage, would become a smooth curve showing the variation 
in the antilogs of the likelihoods of pi with changes in It is 
to be noted that the maximum point of this curve is also the 
point of maximum likelihood, since a logarithm is a maximum 
when its antilog is a maximum. 

Without undertaking the mathematical analysis involved,^ 
it may be pointed out that the value of pi which has the maximum 
likelihood is the value for which pi = p^. In other words, 
the maximum likelihood estimate of a population percentage is the 
percentage jdelded b 3 ' a given sample. This then becomes the 
best estimate of the pppidation figure; that is to say, the sample 
percentage is the optimum, or best, estimate of the population 
percentage. 

SAMPLING OF MEANS AND VARIANCES 

Sampling Distribution of Means and Variances. The Mean. 
Most of the preceding analysis applying to sample percentages 

* For such analysis, see ibiJ., pp. 208—209. 



31G 


THE \OR\lkL tRhQVL\C\ CURVL 


applies equally well to means of samples from a contmijouslj 
distnbuted population If the population is normal m form, it 
can rcadil> be demonstrated that means of samples from such a 
population inll form a frequency distribution which is aliO nonn il 
m form, the mean of which is the mean of the population and the 
\ ariance of w hich is the \ anance of the population di\ idcd bj the 
bize of the sample 

If the population is not normal, the sampling distnbution of 
<?ample means nevertheless tends to be normal, with a mean 
equal to the mean of the population and a variance equal to the 
V anance of the population divided by the size of the sample * 

Accordmglj the equation for the standard deviation of the 
sampling distribution of sample moans is as follow's 



Ihjs is conventionally called the “standard error” of the mean * 
The Variance If samples are taken from anormal population, 
the sampling distribution of sample variances is not normal for 
small samples but approaches the normal form as the samples 
become larger say larger than 30 cases The mean of this 
normal distribution is the variance of the population, and the 
standard dev lation of the sampling distnlmtion is the v anance 
of the population multiplied by -y/^/N 

It IS to be noted that, if the population is not normal, the 
samplmg di&tnbution of sample variances may not become 
normal, even for relatively laigc samples Hence the use of 
the normal curve for makmg mfertnccs about a population 
variance when the population is not normal may be an unwise 
procedure, even when the sample is large 
But for variances of laigc simples taken from normal popula- 
tions, the standaid error of the vanance is given bj 



‘/bid p 164 

• Staiid-ird errors jrc priuletl in boldface tjpe because tbcj represent tbc 
staudard deviations of the populations of all i>ossible sample statistics of 
the type m question Tluis rfy is the stnmUrl deviation of nil possiblt, 
snmp'e ? s. 



VSJi OF THE KORMAL FREQUEXCY CURVE 


317 


The Stamlard Deviation. For standard deviations of large 
samples taken from normal populations, the standard error of 
the standard deviation is given by 




d. 


(5) 


Inferences about Population Means and Variances. Since 
the sampling distribution of sample means tends to be normal 
in form and the same is true of the sampling distribution of 
variances and standard deviations, if the population is normal, 
it follows that the normal curve can be used to make inferences 
about the population values of these parameters from correspqnd- 
ing sample statistics. 

Testing a Hypothesis about the Population Mean. To illustrate 
how a hypothesis about a population mean may be tested, con- 
sider the following example. Suppose it is claimed that the 
mean length of life of a certain make of shoe (mth constant wear) 
is 11.5 months. A random sample of 100 shoes is tested, and 
it is found that the average length of life of this sample is 10.8 
months. The standard deviation of the sample is 1.2 months. 
Do these sample results wan-ant the rejection of the claim of a 
true mean value of 11.5 months? 

To answer this question, proceed as follows: Let the risk of 
rejecting a hypothesis when it is true be set at 0.05. Then cal- 
culate the standard deviation of the sampling distribution of the 
mean (the “standard error” of the mean, as it is called) from 
Eq. (3). Since the standard deviation of the population is not 
known in this instance, the standard deviation of the sample 
must be used in its stead. '■ 

The value of d.v for the given problem is accordingly 

dt = = 0.12 month 

VlOO 

Next, calculate the difference between the hypothetical value 
of the mean and the sample value of the mean. This is 


10.8 — 11.5 = 0.7 month 


Finally, compare this difference with the standard error of the 
mean. If the difference is more than twice the standard error, 

1 This substitution does not materially affect the analysis when the 
sample is large. For further discussion, see ibid., pp. 273-284. 



318 Tllb NORMAL PREQUhVCY CURVE 

the hypothesis will not be accepted In the present instance 0 7 
IS over five times greater than 0 12, so the claim that the true 
mean is 11 5 is rejected The sample mean deviates too greatly 
from the hypothetical mean for the latter to be accepted as 
reasonable 

Confidence Limits for the Population Vean Confidence limits 
for the true mean Mith a confidence coefficient of 0 95 will be 
obtained by laying off 2dr plus and pi>uu8 fiom the sample 
value Thus, m the present problem these limits mil be 

10 8 ± 2(0 12) == 11 04 and 10 56 Accordingly it can be said 
that there is a probability of 0 95 that the interval from 10 56 to 

11 04 includes the true population mean within its range 

Optimum Estimate of the Population Mean If the method of 

maximum likelihood is used to give the best estimate of the 
populatJon mean, it is found that the sample mean is the nmi 
mum likelihood estimate of the population mean Hence, in 
the piescnfc instance the best estimate of the population mean 
IS 10 8 mouths 

2estin(f a Hypothesis about the Population Variance The 
same anal} sis can be applied to inference regarding population 
variances from sample vananccs Suppose it is claimed that 
the true variability in the life of the given make of shoes is 
1 0 month As m the case of the mean, this hypothesis may be 
tested by comparing the hypothetical value \Yith the standard 
deviation of the sample of 100 shoes, which it will be assumed is 
1 2 months 

The X anances, or squares of the standard deviations, are 1 0 
and 1 44 square months respectively Their difference as 
1 44 — 1 00 = 0 44 square month The standard deviation of 
the sampling distribution of sample variances * e the standard 
error of the sample a anance is 

Since the diffeience between the hypothetical value and the 
sample \alue is more than thiee tunes (0 44/0 14 — 3+) the 
standard error of the sample variance, the hypothesis must again 
be rejected * 

* 1 or more exact methods especially appbc&ble to small samples see tbid 
pp 284 287 



USE OF THE NOR.HAL FREQUENCY CURVE 


319 


If it were desired to test a hypothesis about the standard devia- 
tion, rather than about the variance, Eq. (5) would be used. In 
the present instance, the population standard deviation is hypo- 
thetically set at 1.0 month and the standard deviation in the 
sample is 1.2 months; the difference is 0.2 month. Using Eq. (5), 
the standard error of the standard deviation in this problem is 
found to be 


d. 


1.00 


0.07 


Since the difference is almost three times the standard error, the 
hypothesis is rejected as unreasonable. 

Confidence Limits for the Population Variance. Confidence 
limits for the population variance with a confidence coefficient of 
0.95 are given by 

d-'- = <T- ± 2d,, 

= 1.44 ± 2(0.14) = 1.72 and I.IO 

It can thus be said that there is a probability of 0.95 that the 
interval from 1.10 to 1.72 includes the true variance. The cor- 
responding interval for the population standard deviation is from 
1.06 to 1.34, obtained by making use of Eq. (5). 

Optimum Estimate of the Population Variance. Finally, the 
maximum likelihood estimate of the population variance is (for 
large samples) approximately the variance of the sample.' 
Hence the best estimate of the population variance in this instance 
is 1.44, which gives a population standard deviation of 1.2. 

CONCLUSION 

From the few illustrations in this chapter, it should be clear 
that the normal curve is very useful in making inferences about 
populations from random samples. It can be used to measure 
sampling fluctuations in sample percentages, sample means, and, 
in certain instances, sample variances, as well as in a number of 

y 

‘ For small samples the multiplier ^y‘_ ^ should be applied to the sample 

variance to give a better estimate of the population variance. Thus the 
optimum estimate, if N is small, say less than 30, is as follows; 



Cf. ibid., pp. 290-291. 



320 lllh \OR\IAL tRhUUh\C\ CURVh 

other statistics It also has roou) uses in more advanced sam- 
pling analyses and is probablj the most important sampling 
distnbntion that occurs in statistical tlieory 

Table 23 contains not onlj the standard errors discussed in 
this chapter but also the standaid eirors for a number of other 
statistics The method of applying these formulas to test 
hypotheses, to set up confidence intciaaU, or to obtain optimum 
estimates is ‘Similar for all statistics obtained from large samples 

Table 23 — Samplinq Ehroks in Eleulntakt Statistics for hich the 
Samplinq Distribution Approxisiates the Normal Curve 
(Ordinawly these formulas for Blandard error cannot be used for V < 301 


Rt>t St m 

''tandatd t rort 

t 

'' ■ 





R. V^(*+3/ 

2(odi-6d -9) 

1 22t • 



Ml 

« , 1 25331 % 

vv 

ADv 

4a u 0 COod,* 

z(r = ta; 

1 

d = 7— 

y/\ ~ 3 

Qt 

<h 

1 do = do. = 1 36263 ~ ^ 

1 Vn 

p> 


« . . 


bi,t . 

d 1 . 

dk , . = 7= 

‘ ' d/t «VN 


*e/\lADCR IlbehtC D/S(«lurirol [1038) pp 142 11" 



PART IV 


Study of Bivariates and Multivariates 

CPL\PTER xiir 
SIMPLE CORRELATION 
CORRELATION FUNDAMENTAL TO KNOWLEDGE 

Progressive development in the methods of science and philos- 
ophy has been characterized by increase in the knowledge of 
relationships, or correlations. Xatiire has been foimd to be a 
multiplicity of interrelated forces. The phenomena of the 
ph3'’sical world outside man seem to be well adapted to this 
concept of interrelation-ship. The same is true with respect 
to phenomena having to do with human beings and their 
environment. 

Progress in the Discovery of Correlation. In the phj'sical 
sciences, where the laws of nature are, Avithin certain limits, 
determinate, e.\perimental method has sufficed to disclose innu- 
merable relationships, ilany of these ph^vaical correlations have 
become definitelj’’ known as “cause and effect relationships.” To 
some degree, too, this is true of biologj*, anthropologj’’, geologA', 
and the like. In these fields of study, great progress was made 
possible bj' the use of observation of “cases,” by tracing cor- 
relations prevdouslj'" known or suspected, and bj' laboratoiy 
e.xperiments. In the social sciences, howeA'er, the establishment 
of certain knowledge, or knowledge of a high degree of probability 
regarding relationships, is a more difficult problem; and little 
scientific progi'ess, comparatively speaking, has been made 
through the speculative method. This is particularty true so far 
as cause and effect relationships ai’e concerned. 

For example, philosophical speculation, based upon qualitative 
or semiquantitath'e observation of e.xperience seemed to mam' 
economists of the eighteenth, nineteenth, and twentieth centuries 

321 



322 ‘iTUDY OF BIVARIATFS AND MULTIVARIATES 

to have codified tho relationship between money and credit on the 
one hand and prices and many social problems on tho other hand 
But no such certainty among these social scientists now exists as 
to the nature of the cause and effect order of events In its 
earlier conception, the pnnciple of the quantity theory of money 
seemed to be one of extraordinary simplicity and determinate* 
ness, but the more it is studied m its quantitative aspects the 
more complicated it is found to be in reahty By the lOSO’s 
and 1940’s, the world of scientific monetary theorists came to be 
characterized by confused cootroversy The practical world still 
awaits their solution of the theoretical problem in order to make 
possible a world-wide solution of the problem of monetary reform 
Some say that increases or decreases in the quantity of money 
cause rising and falling prices, respectively, but others, with con 
\ mcmg argument, maintain that nsing prices cause an mciease m 
the quantity of money, and vice \ersa It is a moot question as 
to whether or not statistics can come to the rescue m the matter 
of deciding the direction of tho cause and eflect relationship, 
but at least the technique has been developed to disclose the 
facts of relationship more precisely than was ever before 
possible 

By the latter half of the nineteenth century, m many fields of 
study, a point had been reached where speculation concerning 
relationships could ad\auce no farther with the existing tech- 
niques More exact measurement of relationship was needed 
Many questions m biology, anthropology, and the social sciences 
generally awaited a scientific answer to the question How can 
relationship be measured? Two mteresting attempts were made 
by American scholars to devise a method of moasuimg relation- 
ship, one in 1877 and the other in 1892 ‘ Credit for the discovery 
of a method, and for its subsequent mathematical development, 
how ever, belongs largely to the scholars of England 

Origin and Development of the Measurement of Correlation In 
the nineteenth century prc-Darwiman and Darwinian doctrines of 

‘ Bowditch, H P , “The Growth of Children,” Eighth Annual Report 
of the State Board of Health of Massachusetts (1877), pp 275-324, Bryan, 
W L, “On the Development of Vblunlsrj Motor Ability,' American 
Journal of P»ychology,Vo\ 5(1892), pp 123-2(W These are both described 
in Helen M Walker, Sludiet in the Hietory of Stalittical Method (1929), pp 
100-102, 109-110 



SIMPLE CORRELATION 


323 


evolution were taking root, and the question of the influence of 
heredity vs. environment upon human characteristics was in a 
state of rarefied speculation and controversy. The e.xperimental 
data appeared chaotic and amenable to as many interpretations 
as there were interpreters. 

One of the great nineteenth-century students of the problem of 
heredity was Sir Francis Galton. He had been profoundly 
impressed by Darvdn’s Origin of Species (1859), concerning which 
he said,* “Its effect was to demolish a multitude of dogmatic 
barriers by a single stroke, and to arouse a spirit of rebellion 
against all ancient authorities whose positive and unauthenticated 
statements were contradicted by modern science.” Galton made 
numerous studies on the subject of heredity. The question that 
was motivating his studies was: How is it possible for a whole 
population to remain alike in its featm’es, as a whole, during many 
successive generations, if the average produce of each couple 
resemble- their parents? He attacked the question by studying 
sweet peas, moths, hounds, and finally the records of human 
families, which he obtained by offering prizes. 

Between the years 1877 and 1889, Galton worked out a mathe- 
matical method by which he could give an exact measure of the 
relationship between, for example, heights of children and the 
average heights of their parents. By statistical measurement he 
found that, if the stature of a group of parents is found to be, say 
y inches above or below the general average of the race, the aver- 
age stature of their children wall be only -g-y inches above or below 
the average of the race; and he induced the law that the mean 
heights of offspring tend to “regi-ess back toward the mean of the 
race” in spite of the strong hereditary influence of the parents. 
This is the famous law of regression to type, although the exact 
figure "I is not to be taken as final. 

The method Galton used was based upon the median and 
quartiles and has not been generally followed in subsequent Avork. 
In the 1890’s another method, based on the- arithmetic mean 
and the standard deviation, Avas devised by Karl Pearson. His 

A “Hereditarj' Talent and Character,” Macmillan’s Magazine, Vol. 12 
(May, 1865-October, 1865), pp. 157-166, 318-327; Hereditary Genius 
(1869, 2d ed. 1892); English Men of Science (1874); Human Faculty (1883); 
Record of Family Faculties (1884); Life History Album (1884); Natural 
Inheritance (1889). Cf. Walker, op. cit., pp. 102-103, 



324 SlUDl Oh BlVARlAlhS !VZ> MVIAlVARlAlhls 


method has been iwdelj adopted and is known as the “Peaibonian 
coefficient of con elation 

It should be pointed out that in the fields of meteorology and 
astronomy mathematicians had previously worked out a formula 
for a joint or bivanate normal frequency distnbi^tion This 
gave the probability of the simultaneous occurrence of t\\ o errors 
of observation but did not directly mdicate a measure of correla- 
tion between them Work in this field was more concerned with 
the simultaneous occurrence of independent errors than of 
correlated errors * Gallon, as already indicated, was pnmanjj 
concerned with the problem of correlation, and it remained for 
Karl Peaison and others to combine the work of Gallon and the 
w ork of the mathematicians into a unified theory of correlation 
Pearson’s development of the theory of correlation will be 
explained on page 338 to 349 

^ppheaftons o/ the Method by iSoctaf Scie7iltst$ As early as 
1901, R H Hooker, uMng the Pearsonian coefficient, studied 
correlation between marriage rates and trade He conelated 
marriage rates \vith per capita exports of England, wth per 
capita imports, and with other trade e\ents‘ In 1906, G 
Udn> Yule likewise made a study of correlation between mar- 
nage rates and trade He also correlated trade actuity with 
birth rates and death rates but found little correlation between 
them * 

^Cf M^lkir, op cU , pp 110-115 Pbabson, Karl Notea on the 
Historj of Correlation ' Siomelnka Vol 13 (1920-1921), pp 25-45, where 
he cites M F U Mcldon ‘ Variations Occurring m certain Decapod Crus 
tacea — 1 Crangon vulgam ’ Proceedings of tho Royal Society of London, Vo) 
47 (1890) pp 445-453 WBlJ>o^, W F R , “Certain Correlated Variations 
in Crangon vulgaris,” Proceedings of the Royal SocMty of London, Vol 51 
(1892) pp 2-21, Yule, G U , “On the Theory of Correlation,” /ournoi a/ 
the Royal Statistical Soculy Vol 60 (1897), pp 812-850 

* Pretorius, S J ' Skew Bivanate Frequency Surface, Examined lu the 
Light of Numerical Illustrations,' Btomelrika Vol 22 (1930-1931), pp 
109-223 PuRfeON KutL, “Ihe Contribution of Giovanni Piana to the 
Normal Bivanate Frequency Surface,” Btonuirika, Vol 20A (1928), pp 
295-298 lliU-KEK Hllev M , ' /he Relation of PJana and Bravais to the 
Theory of Correlation, ’ Isis, Vol 10, No 34 (1938), pp 466-484 

* Correlation of the Mamago-rate with irade,’' Journal of Pie Royal 
Statistical Society Vol 64 (September, 1901), pp 485-492 

♦Yuli, G Udni, On Changes la the Marriage- and Birth i itca in 
I ngland ami Mules hie Journal of the Royal Slatutical Society Vol 69 



SIMPLE CORHELATIOiV 


325 


The entire science of biometrics has been built up by the 
development of correlation methods; Karl Pearson is one of the 
founders of Biometrika, the scientific organ in that field of studj'. 
Correlation measurement has been intensively applied in psj^- 
chological and educational research.^ In recent years, the 
correlation method has played an important role in the analysis 
of economic problems and in economic theory, a trend particularly 
evident in the field of agricultural economics. 


THE BIVARIATE FREQUENCY DISTRIBUTION 


The statistical basis for the study of correlation is the bivariate 
or multivariate frequency distribution. In the univariate 
frequency distributions studied in the previous chapters, the 
data were classified according to a single characteristic. In 
bivariate or multivariate distributions, data are classified accord- 
ing to two or more characteristics. This chapter will be con- 
cerned with the analysis of bivariate distributions. Chapter 
XVI will deal with multivariate distributions. 

An Illustration of a Bivariate Distribution. Table 24 shows 
the disti'ibution of grades of 81 freshmen in a second-semester 
English course at IMount Holyoke. For each of these 81 students 


Tablu 24. — Giiadk.s or 81 Mount Holyoke Freshmb.v in a Seconu- 
SE.MESTEn English Course 


Grades 

Frcfiuei 

A'. 

F 

60- 

1 

80- 

0 

100- 

3 

120- 

0 

140- 

2 

160- 

9 

180- 

8 

200- 

16 

220- 

17 

240- 

13 

260- 

9 

280- 

1 

300- 

2 




(1906), pp. 88-132; “The Applications of the Method of Correlation to Social 
and Economic Statistic.^,” Journal of Ihe Royal Stalislical Society, Vol. 72 
(1909), pp. 721-730. 

1 Ruoo, Harold 0., Sialislical Methods Applied to Education. 



326 STUDY OF BIVARIAThS AKD MULl IV iRIATBS 

there is also available the grade m first-semester Lnglish Hence 
they may be cross-classified according to both their first- and 
second-semester grades This has been done m Table 25 


TABLt 25 ABrrARrATi,tRBQTJl.NCYDlbTOlBUTIOVOF8l MOUVT HOLYOkE 

tai-SHiiLS Accordinq yo Tusib Gbaocs w First- (Xj) and Secosd- 
(A'l) ScuEsres EvoLisii 


\i\. 

60- 

BO- 

lOO- 

IJO- 

g 

B 

ISO- 

200- 

220- 

240- 

260- 

280- 

F 

60- 




> 

■ 

1 








80- 





■ 

■ 








IB 

2 


1 











120- I 







L 





140- 



n 

1 



1 







m 

■ 



* 

3 

1 









■ 

1 



2 

4 








200- 


1 




3 

‘ 

r 

2 




16 

220- 




r” 




4 

7 

4 



17 

240- 








2 

7 

3 

1 


13 

260- 









‘ 

4 

4 



280- 









1 





300- 












2 


F 

2 

o| . 

7 

5 

8 

9 

13 

IS 

11 


rr 

81 


The bivanatt frequency distribution represented by Table 25 
gives moie complete information tlian is contained m the uni- 
lariate frequency distnbution of Table 24 Of the 8 students 
having second-semester grades between 180 and 200, the seventh 
roiv of Table 25 shous that 2 had fiist-semester grades betiveen 
140 and 160, 4 had first-semester grades between 160 and 180, 
and 2 had first-semester grades between 180 and 200 This is a 
small univariate frequency distribution of the group of students 






SIMPLE CORRELATION 


327 


wiio Iiacl gradGs between 180 and 200 in tbeii' second-semester 
course. In Table 25 there are 11 ro-svs and 11 columns each of 
which contains a univariate frequency distribution. Since there 
are 11 subgroups of 11 groups, there are altogether 121 classes, 
represented in the table by 121 squares, or cells, of which 28 cells 
contain frequencies. 

The totals of the columns of Table 25 gives the univariate 
frequency distribution of all the students classified according to 
their first-semester English grade. The totals of the rows gives 
the univariate frequency distribution of all the students classified 
according to their second-semester English grades. 

For each of the columns an arithmetic mean could be calcu- 
lated and the question could be answered: • Did girls who earned 
high grades in their fii-st-semester English average higher grades 
in second-semester English than did the guls who attained only 
low grades in their first-semester English? An arithmetic mean 
could similarly be calculated for each of the row frequency 
distributions. For all the 11 column frequency distributions 
and all the 11 row frequency distributions the standard deviations 
also could be calculated. In other words, in this bivariate 
frequency distribution there are 22 univariate frequency dis- 
tributions in addition to the 2 univariate frequenc}'- distributions 
represented by the totals for the respective variables. Each of 
these 22 frequency distributions might be analyzed in the same 
way as any frequency distribution. 

METHODS OF SUMMARIZATION AND COMPARISON IN 
BIVARIATE DISTRIBUTIONS 

The characteristics of a bivariate frequency distribution can 
be described bj'' various statistics. Man}" of these are the same 
as the statistics employed in the description of a univariate 
■frequency distribution, but some are new. Thus, the central 
tendency of one of the two variables may be measured by its 
mean, its mode, or its median. The dispersion of this variable 
may be measured, by its range, standard de\’iation, average 
deviation, or quartile deviation; and its skewness and kurtosLs 
may be measured by jSi and fit, respective!}". The same is true 
for the other variable and for the numerous univariate frequency 
distributions that make up the details of a single bivariate 
distribution, as explained in the preceding paragraph. New 



328 SIVDY OF niViRlAlES lV/> MVLnVAlUMFh 

btatistics aic required, however, to deacnbo the tendeucj of the 
vanabiet. to var^ m unibon A bivanatc frequenej distnbutioii 
thus pre«;cnta the new problem of measuring con elation and the 
discov er> of statistics for measuring it 

Progressions of Means If the data aie gioupcd lu the form 
of a bivanate scatter diagram such as Table 25, one waj to 
measure the asso'*iation betweciv the two variables is to compute 
the mean values of one vanablc for vanous values of the othei 



variable In Table 20, for example, the means of the columns 
would show how the Xi vanable tends to change on the average 
with changes in Xj, and the means of the rows show how the Aj 
vanable tends to change on the average with changes m Aj 
The values of these column and row means are given in Table 20 
and giaphed in Figs 105 and 106 

The nature of the association between the variables is evident 
from the&c graphs Consider, for example, the progression of 
the means of Xi shown m Table 26 and Fig 105 These show 
that the mean value of Xi tends to mciease with increases in A i 




SIMPLE CORRELATIOX 


32'J 


T-hus, when A 2 is between 100 and 120, the mean value of is 
110; when X 2 is between 200 and 220, the mean value of .Yi is 
222.31 ; and when ^2 is between 260 and 280, the mean value of Xi 
is 266.0. Although the increase in the average value of Zj with a 
given increase in Z 2 does not appear to be uniform, the progres- 
sion of the means of Zi with a change in Z 2 does appear to follow 



a straight line. The same can be said of the progression of the 
means of Z 2 with changes in Zi. 

Lines of Regression. The tendency of the progressions of 
means to follow straight lines suggests the following hj'pothesis: 
Consider first the progression of the means of Z 1 with changes in 
Z 2 . Suppose that Xi is related to Z 2 in such' a way that an 
increase in Z 2 of one unit always produces an increase in Zi of, 
say b units, b being a constant. If Z 2 were the only factor affect- 
ing Zi, all the values of Zj, when plotted, would fall exactly on a 
straight line and the progression of all means would be perfectly 
linear. If there were other forces affecting Z i, however, causing 




330 STUDY Ot DiVARliThS \.\D MVLTlYAlUAThS 


it to be higher or lower than the \alue t\pected from itn as&om 
tion mth Xi, then the actual values would not fall on a straight 
line but would be scattered about tliat line If tlm view of the 
vanation between A*i and Xt is adopted, i straight line fitted to 
the data should give the law of relationship between A'j and A* 
and the scatter about it should give the deviation from this Imc 
caused by the other factors affecting Ai 


Table 26 — Means op Hows and Means op Columns 
trom Korrtlalion table ghovnng the relaliotuhip between tecond- and firgl 
semegler grade* of 81 Mount Holyole freshtnen 


Prosrc^aion nf me&na at becoDcl-nen ester 
Ti (luK (rede ( YO with euccesK le 
of Brst-eemexter Hr d ah grade (Y.) 


Progrcusion el meana of firat-xen cater 
Lnghah grade ( Vt) mth aucceeeii e t aluea 
of second eei eater 1 nxliah grade C Y ) 
Rexreaaion of Xt on Yi 


ValiceolVt 1 

Meuns of X \ 

T. 

VoluescilTi ! 

tlean* of Xt 

S. 

60- 

110 00 

CO- ' 

130 00 

80- 


80- 


JOO- 

110 00 

100- 

83 33 

120- 

152 86 1 

120- 


140- 

178 00 

140- 

ICO 00 

160- 

105 00 

ICO- 

141 11 

ISO 1 

' 203 33 1 

ISO 1 

170 00 

200- 

222 3l ' 

200- ' 

200 00 

220- 

1 241 11 

220- 

225 29 

240- 

' 250 00 

240- ' 

234 62 

2G0- 

266 00 

260- ' 

1 256 67 

280- 

310 00 j 

280- 

230 00 


300- I 290 00 


A similar view could be taken of the variation in the mean 
value of Xi with changes m Xi and would justify drawing a 
straight line to show the law of relationship between A'j and A'l 
The lines that are derived to show the relationship between the 
mean value of one v anable and changes m value of another arc 
called “lines of rcgiession,” following- Gallon, who used tins teroi 
m his onginal study of the relationship between the heights of 
childicn and the heights of their parents 
The Line of Regression ofXi on Xt Suppose the abov e h> potb- 
esis IS adopted, namely, that Xi w linearly related to Xj and 



SIMPLE CORRELATION 


331 


tha'u deviations from this relationship are the result of forces 
independent of -^ 2 * The statistical problem then becomes how to 
drau the line that is supposed to show this linear relationship. 

One of the simplest ways of finding the line of regression of A'l 
on Z 2 is to plot the progression of the means of Xi for various 
values of Z 2 and to draw a line freehand through the means so that 
it seems to fit the progression of means. The great difficulty with 
this method is that it involves considerable personal discretion 
and that no two persons will necessarilj'- draw the same line. 

An impersonal method of fitting a line to a given set of data is 
the so-called “method of least squares.” This fits a line to the 
data so that the sum of the deviations of the dependent variable 
from the line is zero and the sum of the squares of the deviations is 
a minimum (hence the name “method of least squares”). 
Mathematically, the first of these two conditions follows from 
the second, so that there is really only one condition, viz., that of 
least squares. 

The use of the method of least squares to fit lines to a set of 
data goes back to the beginning of the nineteenth century. It 
first came into prominence in 1806, when Adrien IMarie Legendre 
(1752-1833) published a book on new methods of determining 
orbits of comets. After the publication of this book, Karl 
Friedrich Gauss (1777-1855), a German mathematician, claimed 
that he had been applying this principle since 1795. 

Later it was sho\vn that, if the method is used to fit a line to a 
sample set of data, then, under particular circumstances, the line 
so determined is the best, or optimum, estimate of the population 
line. For example, if data are available as to the orbit of a comet 
or planet and if a line or curve is fitted to these data by the method 
of least squares, then the line or curve so obtained would be the 
most probable estimate of the true orbit. ^ 

The line of regression of Xi on Z 2 may be derived by the method 
of least squares as follows; Consider the point P, Fig. 107. This, 
according to hypothesis, would fall at P' if there were no forces 
associated wth other than Z 2 . Supposedly, however, there 
are other forces that are independent of Z 2 and make Zi smaller 
than this average value so as to cause the point to be located 
actually at P. Since these other forces affect only Zj, the point 
is deflected in a vertical direction. The line of regression of Zi on 
' See Smith and Duncan, Sampling Slalislics, pp. 372-375. 



332 STVDY OF BIVAHIATES AND MVL7 IVAUIAIES 


Kt IS therefore to be obtained b> miiiimiiing the vertical devia- 
tions from the line 

Let the equation of this line of regression of Xi on Xi be 

Yi = «Ji j + biiXi 

The dev utions would then be 

di I «= X\ — Xj = Xi — Oil — biiXi 
and the problem is to determine ai i and hu so as to minimize the 



Fio 107 — Diagram illustrating the fitting ol the line of regression of Xi on Xt 
ijy the method of least squares (vertical deviations minimised) 

sum of the squares of deviations like xi — x\ shown in Fig 107, i e 
(A'l — -Y'j = xi — because xi = Yi — Xx and x' = YJ — -t’l), 

2(Yi — YJ)* = S(Yi — ail — 6i*^*)* “ minimum 

According to the differential calculus, the conditions for min- 
imizing 2(Yi — Oil — are that the partial denvativc 

with respect to oj i and the partial derivative with resjiect to hu 



SIMPLE CORRELATIOX 


333 


should both be'zero. These conditions are 


— tti.2 — 
dai .2 

dbi 2 


-22(Xi - ai.2 - bioXo) 

= 0 = 'j:d ^2 (1) 

-22(A\ - fli., - 

= 0 = 2d, .2X0 (2) 


If the parentheses are removed, these equations ma}' be written 

iVffli.2 "h 6,22X2 = 2X1 

ai.22X2 + 6,22X1 = 2X1X2 

(2ai.2 = Nai.2 because ai.2 is a constant.) 

The first gives a ,.2 in terms of 6,2 as follows: 


a,. 2 — A, — 612X2 (3) 

( 2 Xi/N = Xi, and 2X2/X = X2.) 

If this is substituted in the second, the value of 6,2 is found to be 

2X1X2 - XX1X2 


612 = 


2A1 - XX? 


(4) 


Equations (3) and (4) thus give the values of a,.2 and 612 in terms 
of the sample values of A'’, and A^. If these values are grouped 
into class intervals and deviations are measmnd from an arbitrarj' 
m-igin, the last equation may be put in the form 


612 = 


Avhere 


Cl 


Kt) c. A"(f) 

i-i 


N 


N 


(5) 


If deviations are measured from the means of Xi and X2, then 


O1.2 = 0 


6,2 — 


2a: 10:2 
■ >>2 


(«)- 



334 STUDY OF BlVARIATt'i AND MULTl\ MilAThS 


In the next chapter jii which the \\ oik of measuring conelation 
IS illustrated bj numcncal calculations, it h found that for the 



(lu lOS Diagram illustrating tl>« fitting of the line of rcgriasion of Xi ou Ai 
hy the method of leait M|uares (liontontal dcviationB mtnimiied) 


bivanate frequenej distribution of Table 25, 




111 

SI 


51 
’ 81 


(81)(81) 


493 - 81 


M* 

(81)* 


= 0 8322 



SIMPLE CORItELATIOX 


385 


Li 7 ie of Regression of X2 on Xi. The line of regre,=!sion of X2 on 
'Xi may also be obtained by either freehand or mathematical 
methods. A freehand line could be obtained by drawing a line 
through the progression of the means of Ao on Xi. A mathe- 
matical line could be obtained by the method of least squares. 

The preceding section determined a mathematical formula for 
the line of regression of Ai on X2 b\’- minimizing the sum of the 
squares of the vertical deviations. Xow X2 is assumed to be the 
dependent variable, and the line of regression of X2 on Ai is 
determined bj"- minimizing the siun of the squares of the hoiizontal 
deviations (see Tig. 108 ). Except for this difference, the process 
is precisely the same as that described for fitting the other line 
and vdll not be repeated here. If the line of regression of X2 on 
Ai is represented bj" the equation 

Xo = (is I -h l>2iXi 


then minimizing S(X2 — X'f)- — ^(Xj — a2.i — l^siXi)^ gives the 
following values for 02.1 and 621: 


hn = 


fl 2 .l — X2 — 1^21X1 
2X1X2 - NX1X2 


SAl - NX\ 


( 7 ) 

(8) 



If deviations are measured from the means of Xi and Xo, then 


= 0 
021 — 

For the data of Table 25 the line of regres.sion of Xo on Xi is thus 
found to be 



X 2 = -5.518 + 0.9642Xi 


This is shown in Fig. 108 . 



336 STUDY OF BIV iltlATBS iND MLLTIV IRUTES 

Interpretation of a Line of Regression A Imc of regicision 
of one \anable on another is to be interpreted as indicating the 
values of the first (the dependent variable) that nouJd be 
obtained for \ arious values of the second (the independent van 
able) if no othei forces were affecting the dependent vanable 
If knowledge of the independent vanable is all tliat is to be had, 
then the line of regression gives the best estimate that may be 
made of the dependent vanable 

The regression statistic a (that is, oi t or at i) gives the \ alue 
of the dependent variable when the independent variable is zero 
It IS of only aibitrary significance since its value is affected b^ 
the origin selected for measuiing the independent vanable as 
well as the units of measurement Tlie regression statistic b 
(that IV bit or bu) is independent of the origin selected and mdi 
cates the change that would occur m tho dependent variable per 
unit change m the independent variable In tho line of regression 
of \i on Aj for example when Xt increases by one unit, Xi 
increases or decreases by htt units depending on the sign of bu 
The value of hn will not be affected by proportional changes in 
the units of Xi and \} Similar statements hold for bit m tho 
case of the line of regicssion of Lj on A t 

SlOTidard Dei laUon about Means or Line of Regression If the 
progressions of the means or the hi\ts> of regression are used to 
measure the average reUliou«lup between two variables some 
additional measure is desiiable to detennino the degree of repie- 
sentativcness of these mcasuies In the case of a monovanate 
distiibution it will be recalled the repiesentativene^-s of the 
mean depended upon how closely the cases weic scattered aiouiid 
this mean \ aluc i his dispersion was measured by the standard 
deviation or some other measure Similarly in the present 
instance, the leprcsentativcncss of the means of Ai say for 
various values of Y* will be shown by the dispersion of the cases 
around eacli mean 1 he standard dtv lation of the casts m each 
column around the mean of that column may thus be taken to 
show how well the mean lepresents the cases m the column 
The same is true for any low 

In Table 27 are given the standard deviations of tlie columni 
of lablo 25 The zeio values icfcr to the columns m which 
there is only one case The other values center around 16 their 
-vv erage being 16 9 



SIMPLE CORRELATIOX 


337 


It is to be noted that the average standard deviation from the 
means of the columns, as well as the individual standard devia- 
tions from which this average is calculated, are considerably 
less than the total standard deviation of Xi, namely, <ri = 43.9. 
The column means are thus much more representative of the 
column values of Xi than the grand mean is of all the Xi’s. 


Table 27. — Stavdahd Devlvi’iox.1 fob Colcmxs of Table 25 


Column 

Nc 

Xf\rc- 

IH 

- V-Tfr 

(1) 

2 

0 

0 

0 

(2) 

0 




(3) 

1 

0 

0 

0 

(4) 

7 

8,342.86 

1,191.8 

34.5 

(5) 

5 

480.00 

96.0 

9.8 

(6) 

8 

1,400 00 

175.0 

13.2 

(7) 

9 

4,800.00 

533.3 

23.1 

(8) 

13 

2,830.77 

217.8 

14.8 

(9) 

18 

6,577 78 

365.4 

19.1 


11 

3,200.00 

290.9 

17.1 

(11) 

5 


64.0 

8.0 

(12) 

2 


0 

0 


81 

1 



This may be explained by the fact that much of the total varia- 
tion in Xi is due to the variation from column to column, a 
variation that is presumably due to association with X^. "When 
this variation is eliminated, the remaining variation Ls consider- 
ably reduced. A similar analysis would show the same results 
rvith respect to variation around the means of the rows. 

If the association between Xi and Xa is measured by a straight 
line, the representativeness of this line may be measured bj' the 
dispei’sion of cases around it. Such a measure would be the 
standard deviation of the deviations from the line. The stand- 
ard deviation of the vertical deviations from the line of regression 
of Xi on Xi will measure the representativeness of that line, and 
the standard deNuation of the horizontal deviations from the line 
of regression of Xo on Xi will measure its representativeness. 
In either case, (s~ equals the sum of the squared deviations trom 
the line divided by X . If the line is fitted by the method of 
least squares, the sum of the squared deviations from the line 








338 STUDl OF BI\ARJUb6 A\D MUL2 IV UilATES 


maj 

and 


be computed from the equations* 



, = = SX\ - a, j2:A, - bii^XiXt 

( 11 ) 

AVI , 

, = 2«l| , = iA| - a, ,iA, - hniXjX, 

( 12 ) 


N 


and ff \ } 


Ml 

N 


'Ihebe staudaid deviations fiom the lines of legression will alwajs 
be lessS than the total standard devTationj because the variation 
repiesented bj the hue of regression has been eliminated b> 
taking deviations from the line 
The a\ erage standard deviations aiound the means of columns 
oi rousand the standard deviations around the lines of legression 
maj be called “first-order standard deviations,’’ m contrast to 
the total standard deviations, uluch may be called “zero order 
standaid deviations ’’ Sometimes the first order standard 
deviations aie called “standard eiiors of estimate" since they 
indicate the error involved m using a column or row mean or a 
line of legression as an estimate of the dependent variable 
If the as&ociation between A'l and Xt, say, is assumed to be 
measured bv the means of A’, for given values of Xi or by the 
line of regression of Xi on Xt, then the smallness of the first- 
oidei ‘itandaid deviations relative to the zero-ordei standard 
deviations will give some measure of the degree of lepresentative- 
ne&s of these measures of association As w ill be seen m the next 
'•ection, this measure of the degiee of repiesentativeness of a hne 
of regression is closely related to the so-called “Pearsoman 
coefficient of correlation ” As a measure of the degree of repre- 
sentativ eness of a piogression of means, it is closely related to the 
“correlation ratio,’’ which is discussed in Chap XV, Nonlmeai 
Correlation 

The Pearsoman Coefficient of Correlation. The progiessions 
of means and lines of regreswMi desciibed above were concerned 
with describing the “law of lelationship” between the two 
variables They gave the average value of one vanable a&soci- 
* The proof of this is as follows 

XJJ , = Zdi ,( Y, - a, , - h«Y0 =• Zdi rY, - a, » - b, ,Zdi , Y, 
But bi the least squares equations fl) and (2) Sdut = Oand id, fY» =• 0 
Ilencc ^d\^ = Sd,,\, •= SY? - o^,2Yi - hijSYiA, 



HIMPLE CORItELATIOE 


339 


ated with given values of the other variable and showed how 
these average values tended to change in unison with the other 
variable. In this section, a measure of the degree of association 
between the two variables null be described. This measure is 
knorni as the Pearsonian coefficient of correlation after the man 
who devised it. 


Exports 



Fig. 109. — A bivariate scatter diagram showing tlie joint variation in import.'- 
into and exports from the United States. [Vniled States Department of Commerce, 
Monthly Summary of Foreign and Domestic Commerce of the United States, T o/. 20 
{March, 1940), p. 37; Survey of Current Business, Vol. 21 {March, 1941)j p. 37; 
Vol. 22 {March, 1942), pp. 5-19.) 


The coefficient of correlation suggested by Karl Pearson m 


1890 is 


'ZxiX« 


(13) 


In this equation, ."Ci and refer to deviations from the mean 
and N to the number of pairs of cases. For the sake of simplicity, 
this coefficient will now be explained by reference to a bivariate 
distribution in which the cases are not grouped into class inteivals. 

Arithmetic View of r. Table 28 and Fig. 109 show the joint 
variation of two variables. They indicate that the large values 



340 Slum 01 lUViltlMhS 1\D MULU\ \ni \rE), 


of \i repiesenting total e\ports from (he United States, 1032- 
1941, are as^iociated, for the most part^ with the large \alucs of 
Xz which repicsent total imports for con'sumption ipto the 
t nited States during the same period of > ears 
The aierage of \j, designated Xi, is found to be 2 89, and the 
aierage of \« designated \j is 2 19 The dcMatioii'i of each 


2S — ix?OHT3 AND IvPOBTS OF MkRCJIINDISF U\lri,D SrATf •>, 
1932-1941 

(la biltiOiis of dolbiK) 


1 

' 

TotAr^npor«* 

^ De -utuaa fni j 

1 r«p«» eS 1 

Troll ct 

Je% at 0 a 

lear 

» ! 








Cl) 

(4) 







1 


1 + 

- 

1932 

1 6 1 

1 1 3 1 

-1 29 ! 

-0 89 

1 1481 


1933 1 

1 7 1 


-1 19 1 

-0 C9 

0 6211 


1934 

2 1 

1 0 1 

-0 79 1 

-0 o9 

0 4G61 


193o 1 

2 3 

1 2 0 1 

-0 o9 1 

-0 19 

0 1121 


1936 

2 0 

' 2 4 

-0 39 

0 21 


-0 0819 

1937 

3 3 

3 0 1 

0 41 

0 81 

0 3321 


1938 

3 1 1 

1 9 1 

0 21 

-0 29 


-0 ofioa 

1939 

3 2 

2 3 I 

0 31 

0 u 

1 0 0341 


1940 1 

4 0 1 

2 6 1 

1 11 

0 41 

0 45ol 


1941 

3 1 1 

3 3 ■ 

2 21 

1 n 

1 2 4a31 


•» 1 

28 9 

1 21 9 1 

0 ' 

0 

1 0 8218 

1 -0 1428 


t * 2S9| 

? =219| 

1 


1 i4rx, = 

= a 6790 

1 


\anable from its respecti\e mean are calculated and enteied ii 
the third and fourth columns of the table The products of x 
and xz, the product deviations are calculated and the result 
entered m the appropnate division of the last column The suji 
of cofumn (3), that is, Sxixt (Che sum of the prodemt de« mtioss) 
13 5 679 

In Fig 109, an \i and an Ai scale are set up in such a iva; 
as to accommodate the range of these variables as shown ii 
columns (1) and (2) of Table 28 lanes perpendicular to tlv 
respective scales at the pomts Yi = 289 and Ai == 219 an 
diavvn so that the figure is divided into four ciuadrants quadran 



SIMPLE COIiUELATIOX 


341 


I containing values of Xi and X, that are both higher than 
average (hence both Xi and are positive) j fjuadrant II contain- 
ing values of that are smaller than average and values of Xi 
that are larger than average (hence Xo is negative and Xi Is posi- 
tive); quadrant'III containing values of A'l and X-, that are both 
smaller than average (hence both Xi and x^ are negative); and 
quadrant IV containing values of Xj that are larger than average 
and values of Xi that are smaller than average (hence is 
positive and Xi is negative). The origin of the coordinates xi, 
Xi, is at the intersection of the perpendicular lines at the Xi 
and Xi of the scales. For example, measured from the original 
origin, the point P has coordinates Xi = 3.3, Xj = 3.0; but 
measured from the intersection of the means the coordinates of 
point P are Xi = 0.41, Xi = 0.81 [.see columns (1), (2), (3), and 
(4) for 1937, Table 28]. It .should be noted that onl}^ one point 
is plotted in the fourth quadrant; this is the 1936 pair of variables 
from Table 28. The 1938 pair of variables from Table 28 
appears as the sole point in the second quadrant. These two 
pairs of variables, 1936 and 1938, are the only ones in the set 
that have negative product deviations. The rest of the pairs of 
observations appear either in .the first or third quadrant because 
their product deviations are positive quantities. 

If the fluctuations of two variables are so associated that their 
plottings appear predominantly in quadrants I and III, the 
SxiXo will be positive. This will be so when larger than average 
values of Xi are associated with larger than average values of 
X 2 (quadrant I) and smaller than average values of Xi are 
associated -with smaller than average values of X 2 (quadrant III). 
Also, if the two variables are so associated that their plottings 
appear predominantly in quadrants II and IV, the sum of the 
product deviations will be negative. This will be so when smaller 
than average values of X 2 are associated with larger than average 
values of Xi (quadrant II) and when larger than average values 
of X 2 are associated with smaller than average values of Xi 
(quadrant IV). Furthermore, if the plottings are equally 
distributed throughout the four quadrants, the sum of the 
product deviations rvUl approach zero because of the canceling 
of plus and minus product deviations. This will be so when 
there is no tendency for association of the variables in any manner, 
that is, when, smaller than average values of Xi are associated 



iu Oh jij\ iJiiiThA iM) \ivLin lJln3h'^ 


about as often with lirger values of \» as wUK smaller \alufs of 
\t etc 

\ Mmihr proccihin, is followed m Table 2o and Hg 110, in 
which \i IS till pnee of UniUsl States goveiTjnicnt bonds anJ 
\4 IS tlu >ield on such bonds Ca.ual insiHjction of the data 
reveals thU when the pnee of bonds is high vicld ls low an I 
\uc versa 


I »U 29 — 1 lUCl-S AVd\II-U>HO> tMTI.D St^TI.^ C )V> IISUEAT U< Nt>9 
1932 1911 

tc^ tgti oil bonJ« o (Uatulind due or callalU nftrr 12 yfiut 



V r »C« 

(SlUO li» > 


O' U ana tn ti 

IT* •erilir* ncaiv* 

1 inluct (la la Iona 

V At 



- 





( 1 

0) 


IS] 






+ 


1932 

OS 8 

3 6S 

-a C2 

0 OiO 


-a 333 

1033 

102 3 

3 31 

-2 12 

0 0*0 


-1 227 

1034 

104 G 

3 12 

0 18 

0 389 

0 070 


1035 

10a o 

2 'J 

1 08 

0 OaO 

0 001 


1030 

103 7 

2 0v> 

-0 72 

-0 081 

0 aS3 


1037 

101 7 

3 OS 

-2 72 

“0 Oal 

0 139 


J033 

103 1 

2 oO 

-1 02 

“0 171 

0 171 


1039 

lOG 0 

2 30 

1 oS 

-0 371 


~0 ay 

1940 

107 2 

2 21 

2 78 

-0 o21 


-1 443 

1941 

111 0 

1 0j 

0 »S 

-0 781 


-5 139 

z « 

1 041 2 

27 31 

0 

0 

1 030 

-13 733 


t, - 10142 

V, - 2 73! 



or 

1 el 






-r*r« - 

-12 703 


rho sum of the prwluct diviiljoiis in fablt 29 is a ne^alnt 
amount namcl> —12 703 Compm on of ljg> J09 and 110 
will at once bnng out the cuntnist m the locition of pur* <f 
plottixl iwiiits here IS m lig 100 the iiomts are inainli i» 
quadrints I and III the iwmU m Fig 110 apjicar pnncipUIv m 
piadrants II and fl 

\gau) thu^ame proccduroisfollowed m laUcSOan 1 big HI 
III which \t IS the height of Pnnccton fn*shmcn and \i u the 
grade of these freshmen m their exanun ition m economics 
In Table 30 the negative and positive product deviations so 
nearly offset cich other tliai the aum of produet divutioas u 







SIMPLE CORRELATION 


343 


only 1.33. The tendency for the scatter of points throughout 
all four quadrants is depicted in Fig. Ill on page 345. 

These three arithmetic illustrations appear to show that 
the sum of the product deviations from the arithmetic means 
of variables, SaJiXa, can be used to measure the extent to which 

Average price 



Fig. 110. — A bivariate scatter diagram showing tlio joint variation in the 
price and yield of United States government bonds. [Federal Reserve Bulletin, 
December, 1938, p. 1045; July, 1940, pp. 701-702, and Survey of Current Business, 
Vol. 21 (March, 1941), p. 36; Vol. 22 (March, 1942), p. 18.] 

the variables are associated or related. F ollowing are the reasons 
for this: 

1. When smaller than average values of Xi are associated 
with smaller than average values of X 3 , the Z 1 X 2 products, being 
~Xi and —xo, are positive, as shown in Tables 28 to 30. 

2. When larger than average values of are associated with 
larger than average values of Xtt, the xix^ products, being +a:i 
and +xn, are also positive, as shotvn in Tables 28 to 30. 



SlMl^LE COUKELATION 


345 


for another set of paired variables. A small sum of product 
deviations may result from the fact that a small number of cases 
is included, and a large sum of product deviations may indicate 
merely that a large number of cases is involved; and yet the actual 
degree of correlation might be the same in the two sets. In the 

Freshmen heights 



l’’iG. 111. — A bivaiiato beattec diagram .showing live joint variation (or lack 
of it) between the heights of Princeton freshmen and their grades on an exami- 
nation ill economic.s. 

second instance the larger sum of product deviations is due solelj^ 
to the fact that it resulted from a larger number of cases. It 
seems obvious that an average of the product deviations, is 
required. Such an average can be obtained by dividing the 
sum of product deviations by N. Thus the average product 
deviation is SxiXa/iV. 

2. The product deviations in terms of original units of the 
data are without meaning because of nonhomogeneity of units. 



310 STUDY Ot mVARlAlt.SA\D \[UTAlVAIiIATES 


Suppo&e the Xi variable la the price of wheat pet bushel, which 
would bo oApressed m dollars aad cents, and the Xt vanable js 
the birth rate Or, again, suppose the A,i variable h tht height 
of men expressed m inches and the Xt \anable is the weight of 
men expressed m pounds Or suppose the A'l variable is the 
marriage rate and the vanablc is the volume of trade, or 
pnees, etc In all such poire of variables, the product deviations 
in terms of onginal units arc meaningless, they arc products of 
nonhomogeneous things What meaning can be ascribed to the 
product of inches and pounds oi to the product of marnage rates 
and volume of trade? It is necessary to perceive m the situation 
a general common dcnomuiatoi 

The comparable thing being compared is the piiie\y abstract 
thmg, deviation above or below average, accordingly, the stand 
ard deviation a may be used as a general common denominator 
Whatever the original unit of measurcmejit, if normally dis 
tnbuted the standard deviation represents approximately 
one-bixth the range of that vanable The standard deviation 
IS a unit of deviation from the mean measuring a common 
characteristic among all variables and is, therefore, a homo* 
gencous unit among all variables Consequently, the standard 
deviation is used to reduce these product deviations to terms of 
comparability with each other When this is done, the average 
product deviation becomes a measure of correlation known as the 
Pearsonian coefficient, namch , 

V 

Z/ (Ti <rj 

r.. = — jy- 


bince ffi and o-j aic constants in each particular problemj this 
equation may be written as follows 


This IS the usual form m which the farinuia for the Pcarsonian 
coefficient of correlation is given The value of this average 
expression fluctuates between the limits +1 ^tnd —1 Anv 
value greater than -j-1 or less than —I is a mistake, not an error 
in the statistical sense If r =* +1, this means perfect positive 
correlation (large values of Xi are a^ciated with large values of 
Xi, and VICO versa), if r = — 1, this means perfect negative 



SIMPLE CORRELATIUX 


347 


correlation (large values of Xi are associated with small values of 
-Y;, and vice versa); if = 0, this means no linear correla- 
tion. 

Calculation of the Coefficient of Correlation. The data in 
Table 28 may be taken to illustrate the detailed calculation 


Table 31. — Calculation or Cokkucient or ComiELmo.N BErwEKx 
United States JixroiiTs and Impori's, 1932-1941 


Deviatiuns from 
rcspccti\e means, 
billions of dollars 

Squares of dcNtattons 
from ro4pccli\o means 

Deviations from 
respectnc mcaiu» in 
standard dcMation 
units 

Product dcMations in 
standard'dev latiori 
units 




.-7 



XI Xi 






Ci 

0\ Oi 

U) j (•-!) 


_ . 

(t) 

(5) 

(0) 

(7) 







+ 

_ 

-1.29 

-0.89 

1.6641 

0.7921 

-1.251 

-1.435 

1.795 


-1.19 

-0.69 

1.4161 

0.4761 

-1.154 

-1.112 

1.283 


-0.79 

-0.59 

0.6241 

0.3-181 

-0.766 

-0 951 

0.728 


-0.59 

-0.19 

0.3-181 

0.0361 

-0.572 

-0.306 

0.175 


-0.39 

0.21 

0.1521 

0.0441 

-0.378 

0.338 

. • • 

-0.128 

0.41 

0.21 

0.31 

O.Sl 

-0.29 

0.11 

0.1681 

0.0-141 

0.0961 

0.6561 

0.0841 

0.0121 

0.398 

0.201 

0.301 

1.306 

-0.-167 

0.177 

0.520 

-0.095 

0.053 

1.11 

0.41 

1.2321 

o.Tosi 

1.077 

0.601 

0.712 


2.21 

1.11 

4.88-11 

1.2321 

2.141 

1.789 

3.836 




10.6290 

3.8490 


. , . 

2- = 9.102 

-0.223 







or net 



<7, = 1.031 

<72 = 0.0204 



V £l . 52 - 

W <71 <72 

= 8.879 


The standard deviutions Here calculated from the sum of columns (3) and { 4 ). 


of the coefficient of correlation, by first making all product 
deviation.s in terms of respective standard-deviation units. 

The Pearsonian coefficient of correlation may now be quickly 
calculated from the sum of product deviations in standard- 
deviation units [the foot of column (7) of Table 31]. I'his sum 
divided by N is the coefficient of correlation. In other words, 

^ cTi _ 8.879 

>■ - Jf IQ- 

^ 0.8879 













348 i>TUDi OF BlVARlAlEi> l\D MILH\ Uimt'i 


It 13 not neccssarv, howe^e^, to divide each deiialion bj lU 
standard de\iation because the t\io standard dcMations are 
constants Tabic 28 ha\mg been constructed, if the standard 
deviations ire calculated, as in columns (3) and (4), Tabic 31, 
it IS then necessary onh to Eq (3) as /oHohs 

LziXi 5 6790 0 5b79 

10(1 031)(06204) " 0 0390 

= 0 8879 


Accordingly columns (5) to (7) of Table 31 need not be computed 
For example, to calculate tlic coefficients of correlation for the 
data in Tables 29 and 30, the standaid dei lations aie calculated 
and the respective coefficients of conelation art then obtained 
as folloM^ 

Correlation between pnecs ind yields on United States 
gov ernment bonds, 1932-1941 

2xji« —12 703 
10(3 16)(0 51) 

- 1 2703 
*■ 16116 
= -0 7882 

Correlation between heights and grades of freshmen 
1 33 

10(1 89)(10 90) 

0 133 0 133 cx, =» 1 889 

(1 89)(10 90) 20 7 <rs =» 10 96 

= 0 0064 

For a small numbei of cases it is possible to calculate a coeffi 
cient of coirelation according to the proceduie illustrated in the 
tables and calculations immediately preceding For a large 
number of pairs of values it is desirable to group tfie* pairs into 
class mterv als The v alue of Yi for each pair then becomes the 
mid point of the interval to which the -Yi value belongs, the 
value of Yj for the pair will be the mid-point to the interval 
to which the X" value lielongs If more than one pair of cases 
belongs to the same \i and X* mtervals, the frequenej of sucii 
juirs Ls fktermincd This proceilmc was jiluatritcd bv the 


(Ts « 3 16 
<r< « 0 51 



SIMPLE CORRELATION 


349 


analysis of Tables 24 and 25 in discussing the bivariate frequency 
distribution of 81 Mount Holyoke freshmen^ When the bivari- 
ates are arranged in a bivariate frequency distribution, r^i is 
measured by ZFxiX 2 /N<ti<to where F represents the frequency of 
pairs of values belonging to the same Xi and intervals. For 
example, in Table 25 (for Xi — 160-, Xt = 120-), F = 5, 

= 47.4, and Xi = 74.1. Accordingly, this FxiX 2 (for Xi = 160-, 
Xa = 140-) is equal to 5(47.4)(74.1) = 17,561.7. When this pro- 
cedure is followed for the entire table, the 'ZFxiX^ is obtained. 
Special methods for calculating r from grouped data are described 
in detail in Chap. XIV, in which advantage is taken of certain 
short-cut procedures. 

Relationship between Lines of Regression and r. If a line 
of regression is fitted by the method of least squares, the values 
of 6 i 2 and bn are given by Eqs. 6 and 10. It will now be shown 
that these reduce to formulas involving rio. From the defini- 
tion of r = hXiX^/Na-io-i, 

HxiXi = Naio-iTii 


Secondly, note that, from the definition of cj = hxl/N, 


Hence, 


llxl = N<tI 


5 12 


2a'i^2 _ Xo'i(X2Vi2 



In the same manner it can be shown that 


, “ 

021 — J’i2 — 

0-1 

Hence, if deviations of the variables are measured from their 
mean values, the lines of regression may be written (in this case 
the ai.2 and a‘..i are both zero) ■ 

X'l = ri2 — X2 (Id) 

0-2 

.F2 = ri2-^a;i (15) 

ffi 

If the first of these cciuations is divided by xi and the second by 
* See pp. 325-326. 



J50 Oh l\2> MULjJVAJlIAli^ 


ffs, they become 


Thus it may be concluded that if the variables arc measured m 
standard-deviation units the slopes of the lines of regression aie 
the Pearsoman coefficient of correlation In this light, is, 
the change m the average value of Xi expressed in <r units when 



I'la 112 — Diagram showing leSationsbip between tinea of regceaaion and the 
PeaTsonian coeffiaent of corretalion r 


Xi changes by one vj unit It is also the change m the average 
value of Xi expressed in <rt units when Xi changes by one ai unit 
This property of r is shown geometrically in Fig 112 This 
shows that the slope of the regression of Xi on X2 is r, with 
reference to the .Yj-axis, and l/r with reference to the Xi axis, 
that is to say, the line of regression of on X2 makes an angle 
with the X2 axis equal to r and an angle with the Xi-axis equal 
to l/r The slope of the regression of X2 on Xi os likewise r, 
but with reference to the Xi axis, and l/r xnth reference to the 
Xs-axis, that ss to say, the Ime of regression of JT, on Xi makes an 




SIMPLE CORRELATION 


351 


angle with the Xi-axis equal to r and an angle wth the Xo-axis 
equal to 1/r. In other words, in Fig. 112, angle a equals a', 
and angle 6 equals angle d'. All this is on the assumption 
that the variables are expressed in standard-deviation units as 
indicated in the equations above. 

Thus, in Fig. 112, the tangent of the angle a is r, and that of 
the angle 9 is 1/r. When ja] g Tr/4, r = tan a g 1. Geo- 
metrically, within the limits ja| g -jr/4, tan a varies between -fl 
and —1, passing through zero, and tan 0 between -j-1 and — 1, 
passing through infinit 3 ^ The two lines of regression merge into 
one line when r = 1 (for tan a = 1 when the angle is a 45-degree 
angle). 

Relationship between r and the First-order Standard Devia- 
tion. It will be recalled that the standard deviation of the 
deviations from the line of regre.ssion of Ai on Xi is equal to 

Ncl, = ^Xf - a,.sSXi - by,SX:Xi 


If the variables are measured from their mean values, this 
becomes 


X<rj_o — — bi2^XiX-2 


But 



b 

-2 — ■ — tind 

= Xcri(72ri2 

0'2 


Hence, 

N(t]o = Noi — 


and 

<^1.2 “ O'iCl ~ ^’ 12 ) 


Finalljq 

(T 1,2 “O'! "x/ 1 ^'12 

(16) 

In the same manner, 

(72.1 ~ ”V^ 1 ' ^'12 

(17) 

These formulas may 

also be put in the form 



,.2 _ , _ 

/to I 0 

(18) 


<ri 



—2 

,.2 - 1 _ ZlA 

/12 — i 2 

(7 2 

(19) 


It will thus be seen that r is closely related to the scatter 
a bout the lines of regression. If this .scatter is a small percentage 



352 STUDY Ot mVARlATES AND MULTIV iRlAlhS 


of the tohil scatter, inclicatuigalugh degree of represtntativenes? 
of a lint of regression, tliui r js lugh If the scatter is a large 
percentage of the total \anatioti m the dependent vanable, 
indicating a lo« degree of representativeness of a line of regres- 
<;[on, then the lalue of r is small In other words, the better 
a line of regression fits the data, the higher the value of r, and 
vice versa The Pcarsonian coefficient of correlation is thus a 
measure of the goodness of fit of the lines of regression 

The Pearsoman Coefficient of Correlalion and the Analysts of 
Variance For every point on a bivanate scatter diagram such 
as Fig 107, there la a corresponding point on the line of legression 
of on Xi Geometrically this is obtained bv projecting the 
point vertically onto the line of regression (see Fig 107) Alge- 
braically, the Zx coordinate of a point on the line of regression is 
found by substituting the given value of xt m the regiession 

equation » ru^xi 

V* 

When the Vdnables are measured from their mean values, the 
mean of the various values of ** is zero Hence the mean of 
the corresponding values of x\ is zero also The standard 
deviation of these x'x values is tlius 


N 


N 




Equation (10) ma\ thus be written 



oj ^ + ffj , (20) 

This says that the total \ anance of tho xi values is equal to the 
vanance of the corresponding points on the line of regression 
plus the vanance of the deviations from these points Another 
way of looking at this is to regard the total vanation m Xx as 
made up of two parts, one consisting of tho vanation (vf^ ) due 
to its association with Xj as represented by the line of regression, 
the other representing the vanation m Xi due to its association 
with factors independent of Xa (that is, al j) 

Similar analysis shows that 

= <+»!, ( 21 ) 



SIMPLE CORRELATION 


353 


in other words, that tlie total variance in is made up of a part 
due to its association with X\, as represented by the line of 
regression of X^ on Xi and a part (o-o.i) due to its association 
with factors independent of as measured by the deviations 
from this line of regression. 

The formula o-" , = which may also be written r\n = 
sheds further light on the meaning of r. It shows that t\^ 
measures the proportion of the total variance in Xi that is due 
to its tissociation with X-^. It also measures the proportion 
of the total variance in .Yj that is due to its association with Xi. 



CHAPTER XIV 


COMPUTATION OF r AND OTHER MEASURES 
OF CORRELATION 

The previous chapter was concerned with an explanation 
of the various devices used to measure the association between 
two variables This chapter wU illustrate their use by carrying 
out a numencal analysis Only bnear correlation will be con 
sidered here Measures of nonlinear correlation will be discussed 
m Chap XV 

The order of analysis wuU be first to calculate the correlation 
coefficient This will be done for both ungrouped and- grouped 
data, and use will be made of short-cut methods of calculation 
For the grouped data, lines of regression will be computed, and 
first-order vaiiances and standard deviations Reference will 
again be made to the progressions of means, but the analysis will 
be continued no further than in the previous chapter 

Computation of r from Ungrouped Data Since i =« X — , 
it follows that 

= S{Xi - \i)(X, - A'O = SXiXj - VXiXs 
Likewise (Ti and a are equal to 

»"■' - '■'5 

Hence the coi relation coefficient can lie computed from the 
equation 

SY,X, - NX,X^ 

rij = — — — — (i) 

VEX! - JVY! VSX! - NX\ 

To illustrate the use of this formula consider again the data 
on exports and imports of Table 28 ‘ These are reproduced 
in Table 32, together with the calculations of SXiXj, X\, A'j 
2X1, Three check columns are also employed The 

checks are column (1) -h column (2) = column (6), 
column (3) + column (4) =« column (7) , 
and . column (3) + column (5) = column (8) 

* See p 3-10 


354 



COMPUTATION OF r AND OTHER MEASURES 355 

In the preliminaiy calculations of Table 32, the last cheek 
failed. This showed that a mistake had been made in either 
column (5) or column (8), for column (3) checked with columns 
(4) and (7). After some investigation the mistake was found in 
column (8). By dividing the checks up in this way, an error 
can be easily located. This sort of check is called a “Charlier 
check.” 


Table 32. — Wohk Sheet for Co.mpditng r from Ungrouped Data 


(1) 

(2) 

(3) 

(4) 

(5) 

(6) 

(7) 

(8) 

Xi 

X, 

XiXj 

Xi* 

X5» 

Xi + A'j 



1.6 

1.3 


2.56 

1.69 

2.9 

4.64 

3.77 

1.7 

1.5 

2.55 

2.89 

2.25 

3.2 

5.44 

4.80 

2.1 

1.6 

3.36 

4.41 

2.56 

3.7 

7.77 

5.92 

2.3 

2,0 

wm 

5.29 

4.00 

4.3 

9.89 

8.60 

2,5 

2.4 

B 

6.25 

5.76 

4.9 

12.25 

11.76 

3.3 

3.0 

9.90 

10.89 

9.00 

6.3 

20.79 

18.90 

3.1 

1.9 

5.89 

9.61 

3.61 

5.0 

15.50 

9.50 

3,2 

2.3 

7.36 


5.29 

5.5 

17.60 

12.65 

4.0 

2.6 


lyQI 

6.76 

6.6 

26.40 

17.16 

5.1 

3.3 

16.83 


10.89 

8.4 

42.84 

27.72 

S = 28.9 

21.9 

68.97 

94.15 

51.81 

50.8 

163.12 

120.78 

li. = 2.89 

^2 = 2.19 








Checks: 

2(U + S(2) = S(8) 
28.9 + 21.9 = 50.8 
2(3) + 2(4) = 2(7) 
68.97 + 94.15 = 163.12 
2(3) + 2(5) = 2(8) 
68.97 + 51.81 = 120.78 


From Table 32, r is found according to Eq. (1) to be equal to 


68.97 - 10(2.89)(2.19) 

^ V(94.15 - 10 X rsV) Vr51.81 - 10 X 2T9") 
68.97 - 63.291 


V(94.15 - 83:M) V (51-81 - 47.961) 
5.679 5.679 


V(10.629) V(3.849) 
5.679 


(3.26) (1.962) 


6.396 


= 0.8879 


(Ti = V l.0629 
(Ti — '\/0.3849 












356 SrUDY 01 BIVAHIAIhS A\D MULlIVAItHrES 


Table 33 — Grades in Si cond- and Fibst-slmesteu English, 81 FftLSH 
MEN IT MoDNT HOLYOAE 

(A, B, C, and D grades have bcea converted to a numerical scale) 


Rtudenl 

number 

grade 

Ai ! 

1 welter 
g^de 

Student 

number 

semcatcr 

grade 

Vi 

t i at- 

i grade 

1 

240 

220 

41 

260 

260 

2 

200 

180 

42 

180 1 

160 

3 

260 

240 

43 

1 100 

60 

4 1 

260 

260 

44 

200 1 

220 

5 

ICO 

160 

45 

1 200 

200 

6 

240 

220 

46 

160 1 

120 

7 ! 

220 

200 

47 

i 180 

160 

8 

60 1 

120 

43 

280 ! 

220 

9 

220 

240 

49 

i 200 

200 

10 

200 

180 

30 

220 ' 

220 

11 

1 220 

220 

31 

! 220 

200 

12 

HO 

180 

52 

240 

220 

IS 

' 160 1 

J20 


' m 

60 

14 

240 

200 

54 

220 

220 

IS 

' 260 1 

240 

55 

' 240 ! 

200 

16 

1 200 

ICO 

oO 

: 200 

220 

17 

' 200 1 

160 

»7 

' 220 

220 

18 

1 240 

240 

58 

1 220 

200 

19 

240 ' 

220 

59 

240 

200 

20 

1 240 

220 

60 

j 180 

140 

21 

160 ! 

140 

61 

160 

140 

22 

1 220 

240 

62 

1 240 

220 

23 

i 200 

200 

63 

260 

260 

24 

1 100 

1 100 

64 

i 160 

120 

2j 

100 ' 

140 

65 

260 ! 

240 

26 

1 200 

1 160 

66 

220 1 

ISO 

27 

180 

180 

67 

220 ' 

240 

28 

180 

' 160 

68 

260 1 

260 

29 

240 

240 

69 

240 ! 

220 

30 

200 

* 200 

70 

200 1 

200 

31 

200 

200 

71 

140 

120 

32 

ISO 

160 

72 

260 1 

240 

33 

200 

220 

73 

200 

180 

34 

400 

120 

74 

300 

230 

35 

240 

240 

75 

ISO 

140 

3Q 1 

220 

220 

76 ' 

220 

180 

37 

220 

240 

77 , 

180 

180 

38 

160 

120 

78 ! 

300 

280 

39 1 

200 

200 

79 1 

220 

220 

10 

220 

220 

SO 1 

220 

200 

1 



81 

200 

ISO 




COMMUTATION OF r ANJi OTllElt MEA8VRE8 


357 


This agrees to two decimal places with the previous calculations 
of this coefficient made in Chap. XIII. The difference is 
due to the different ways of rounding off decimals in making the 
calcvdations. 

Computation of r from Grouped Data. T/te Data. The data 
to be used to illustrate the computation of r for grouped data 
are given in Table 33. They may be explained as follows I 
First pair Xi, X^. The first pair of observations are the 
second-semester and the fimt-semester English grades, respec- 
tively, of student 1, viz., 240,220. 

Second pair X\, X^. The second pair of ob.scrvations are the 
second-semester and the first-semester English gi’ades, respec- 
tively, of student 2, viz., 200,180. ' 

Third pair Xi, X^. The third pair of observations are the 
second-semester and the fimt-semester English grades, respec- 
tively, of student 3, viz. 260,240. 

The Correlation, or Bivariate Frequency, Table. After the data 
have been tabulated as in Table 33, a correlation table, which 
is in effect a bivariate frequency distribution, is constructed. 
The table is set up with class-interval scales suitable for each 
variable,* and additional columns and rows are arranged for the 
required calculations. In the center of each cell of the correla- 
tion table, frequencies are shown; for example, in the first 
column opposite the Xi scale of Table 34, 2 is the frequency of 
occurrence of Xi between 100 and 120 and Xn between 60 and 80. 
Two students, in other words, have grades in second-semester 
English between 100 and 120 and grades in first-semester English 
between 60 and 80. When all the frequencies are recorded in 
the correlation table, it may be used as a work sheet for the 
calculation of the coefficient of correlation. 

Short Method for Calculating r. Like the standard deviation 
and the mean, it is possible to find r by a short method making 
use of arbitrary origins. 


In the formula for r. 


T,FxiX2 


( 2 ) 


a I and CJ 2 may be calculated by the short method that has already 
been presented.^ It remains only to evaluate 'SFxiX^ in terms 

1 On the question of proper .seleetion of ela.ss intervals, see pp. 199-206, 

2 See pp. 214-215, 



Tajilz 34 — Ci>ai(i.LATiuN Taull 

S/iounnj; Iht relalionahtp letucrn necand-icmtster < Vi) and fir»l-*emtsUr (ATi) gradtn of K1 Mount llotyokt frahm 






COMI-'U'J'ATIOX Ol-'lc AND O'VllDli M HAHUIIKH 


35VJ 


of deviations from the arbitraiy origins, Ai and Ai, selected 
for the respective variables and in terms of the two correction 
factore Cl and Cs. To do this, note that' 


where 

and 


3^1 — </i — Cl 


Cl = 




N 


where 

Therefore. 


x~ = di — Cl 


Ca = 


2/'’d2 

N 


>:/'’x,X2 = i'F(di - C,)(£/, - Co) 

which e.\panded is a.s follow .s: 

-^Fx.Xi = i'/'V/ic/o -CiilA/- - Csl'/'V/i + jVCiCo 
But 'LF(l\ = iVCi and '^Fih — NC^, and hence 


i'/'-XiXo = SFdido - lYCiCo 


(3) 

(d) 

(5) 


and accordingly the formula for calculating r by the use of an 
arbitrary origin for A'l and an arbitrary oi’igin for Xn is 


ri2 = 


:SFdido. - rVCiCo. 

iVciCTo 


(fi) 


Further saving in calculation results, however, if this formula 
is put in terms of cla.ss-interval units. In other words, the follow- 
ing form is more conveniently used:- 


ru = 


n ii 



i 1 Zs 


N Z} ZA 

z'l Zs 


(7) 


The correlation table serves as a work sheet for the calculation 
of the coefficient of correlation, as follows: 

‘When Cl = — A'i=di+Ci. By definition = Xi - A, and 

f/i = Xi — 2I; so that xi = di + di — Ai — Ci = di — Ct. 

= The value of the numerator alone is 'XFxyXi) if the problem is one in 
multiple correlation, it will be convenient to have a record of this value as 
well a.s the value of rn. 



360 SlUDli 01 DlViRintS l\D MULi IVARl Uhb 

1 An arbitrary ongin is chosen for each \anable, thus, i» 

Table 34 = 190 and Aj — 190 The arbitrary origins are 

taken at the mid points of a class mter\ al about midway in the 
range of the distribution in order to reduce to a mmimum the 
necessary computations 

2 A column at the side and a row at the bottom of the cor 
relation table aie used to mdicatc, in class mter\al units, the 
deviations of each variable from the respective arbitrary origins 
This supplies entries for the rows under the caption di/h and 
entries m the columns opposite the stub headings dt/tt In Table 
34 Is = i2 = 20 

3 The next column at the side and row at the bottom of Table 
34 aie for the purpose of entering the frequencies multiplied by 
the class interval dcMations Ihc sums of this column and row, 
respectiv ely are used in the calculation of the correction figures 
Cj/ij and Ci/ij and m the computation of the means of the 
separate frequency distributions The sums of the columns give 
the separate frequency distribution of Aj and the sums of the 
rows give the separate frequenej distribution of \i 

4 The next column and the next row are for the frequencies 
multiplied by the class interx al de\ lations squared, in order to 
obtain sums frorn which to calculate the standard deviations of 
the respective vanabics 

5 The means and standard deviations of the two variables 
are calculated as follows ' 

Calculation of the means 

Yi 190 -f W(20) = 190 + 27 40740 
= 217 4 

Y. « 190 + i^(20) - 190 + L4 074 
= 204 1 

Calculation of the standard deviations " 

^ 4 82579 
- = 2 1968 
(71 « 43 94 

‘ tsing Eq {3) p 213 

‘XsmgEq fa) p 215 



COMPUTATION OF r OTHER MEAEUREti 


361 



493 _ /^V 

81 V81/ 

5.59123 


= 6.08642 - 0.49519 


= 2.3646 
o-o = 47.29 


6. The product of the deviations from the chosen arbitrary 
origins is obtained for each cell in the correlation table. This is 
obtained by multiplying the di/ii by the d«/u corresponding to 
the position of that cell. For example, for Xi = 100- and = 
60-, the cell in the first column and third row of Table 34 there 
is a frequenc}’’ of 2. According to the chosen arbitrary origins, 
this cell has a product deviation (in terms of class-interval units) 
of —6 multiplied by —4, or 4-24. Symbolically, this is (di/z'i) 
(di/ii). The table is divided into fom’ quadi’ants by lines 
through the Ai and A 2 . 

Two of these quadrants -will have positive product deviations, 
and two will have negative product deviations. A product 
deviation is entered in each cell that contains frequencies and 
appears in the lower right corner of the cell. None are entered 
in the first quadrant because no frequencies occur in that quad- 
rant. Frequencies occur in onlj' one cell in the third quadrant, 
that is, in the Zi = 200-, Z 2 = 160- cell, for which the deviation 
product is —1 multiplied by 4-1, or —1. 

7. The product deviation in each cell is multiplied by the 
frequency occurring in that cell, in order to obtain the proper 
number of product deviations of that particular cell. The 
product deviation occurs once in some cases and several times in 
others. Obviously, when it occurs several times the sum of the 
product deviations is obtained by multiplying by the frequencies. 
These figures are entered in each cell in the upper right corner. 
Symbolically, they are F{di/i^{do/i^, for each cell. 

8. The sum of the figures calculated in item 7 is obtained, that 
is, the sum of the product deviations multiplied by then- respec- 
tive frequencies. This is accomplished by adding the figures 
occurring in the upper right corner of each cell bj'’ rows and by 
columns and adding the sums of the rows or the sums of the 
columns to obtain the final sum. If both the latter are com- 



362 6rUDY Ot BIY IHIATBS AND MULFIVARIATES 


pitted, there will be a cross check ou addition Symbolically, the 
final aggregate is ^0 ^0 
9 The coefficient of correlation is calculated by the use of 
Eq (7) shown above, as follows 
Calculation of r 


Tit = 


455 - 

81(2 1%77)(2 36468) 

455 - 7811111 376 88889 

420 740W 420 74964 


« +0 89576 


Lines of Regression and First-order Vanances All the values 
that are needed to find the lines of regression of Table 34 have 
now been calculated There aie two lines of regression for each 
correlation table— the first one represents the regtession of A'l 
on Xa and the second the regression of Aj on X: Since r has 
been computed, the easiest formulas for calculating these two 
lines (in original units measured fiom the intersection of the 
ineam> of the two lanabics ns an origin and not in ckss-interva) 
units) are as follows 



riieso equations cm be expressed m the units of the ongintl 
data, that is, the scale as originally formed lather than m devia 
tions from the means os follows 


x: - X, - rJi (Y, - X,) r; - X, = rS - X.) 

ffj ffl 

Calculation For the problem illustrated, the lines of legies 
Sion are calculated as 


= 0 8322ii 


By substituting XJ — Xi for x[ and Yj — A'* fur these 
equations ma> be written as follows 

X; - 217 4 = 0 8322(Xj - 204 1) 

X[ « 0 832X, + 47 58 
YJ - 204 1 * 0 9&42(X, - 217 4) 

A J = 0 964Xi — 5 55 

111 this form the equations are moie easily mteipieted as 
piediction equations The first equation sajs that when a 



COMPUTATION OF r AND OTHER MEASURES 3(i;i 

student has a grade of 100 in first-semester English the predicted 
grade in second-semester English is 83.2 -f 47.6 = 130.8. The 
second equation says that when a student has a grade of 100 
in second-semester English the predicted grade in first-semester 
English is 96.4 — 5.55 = 90.8. 

The two lines of regression are shown in Figs. 105 and 106.^ 
In Fig. 105 line aa' represents the fimt line of regression, 

X[ = 0.832X3 -h 47.58 

The small crosses show the location of the means of the columns 
(calculated and shown in Table 34). It is to be noted that the 
line of regression follows the progression of the means of the 
columns. 

In Fig. 106, line hh' represents the second line of regression 
X 2 = 0.964Xi — 5.55. The .small circles -show the location of 
the means of- the rows (calculated and shown in Table 34). 
It is to be noted that the line of regression follows the progression 
of the means of the rows. 

The scatter about each of the lines of regi’ession, the first-order 
<r, is calculated by using the followng formulas;- 

O' 1.2 = CTl -\/l — rfo cr2.1 = 0'2 V 1 — 

In the problem illustrated, the first-order variances are 
calculated as follows: 

(ri,2 = 43.94(0.44453) 0 - 2.1 = 47.29(0.44453) 

= 19.53 = 21.02 

(When }• = 0.89576, = 0.44453.) 

In Figs. 105 and 106, which show the lines of regression, there 
are also shown the limits indicated by the first-order standard 
deviations. Between these limits, that is, the line of regression 
± 0 - 1.2 for Fig. 105 and the line of regression ± 0 - 2.1 for Fig. 106, 
lie roughly two thii’ds of the fi-equencies, if it can be assumed 
that the population from which the sample is derived is normally 
distributed. This gives some idea of how accurate estimates 
based upon the lines of regression are likely to be. It is to be 

‘See pp. 328, 329.^ 

* Calculation of -\/ 1 — r~ and 1 — r* is avoided by the use of J. R. Miner, 
Tables 0 / -\/l — r- and 1 — r* for Use in Parlial Correlation an d in Trig o- 
nonielry or an ordinary table of sines and cosines, since sin x — ■\/ 1 — cos- x. 



3G1 siLi>i ot m\ uaufcs \\r> uuhl'^ 


noted that all thi. mcan^ of the cohimns Jic «ithm (he limits 
desenhed bv the finst-ordcr fctandard de\ I'ltions ind that all but 
two of the means of the rows lie withm the e limits Each of 
the latter tno means of Ijmg outside these lunits (the first 
ion and the next to the last row) is based upon onlj one student s 
iccord 

Pre^rrssjons of Veans For these data the progressjoas of 
means ha\e already been discussed m Chap \III Figures 
105 and lOG show the means of the columns and the means of the 
rows plotted from the \aluc3 computed in lablc 3-1 and repro- 
duced m Table 2G * Figure 105 rejiresents the means of the 
\ ertical frequenej distributions of Tables 25 and 34 it gives the 
progression of the means of Yi with clianging values of \s lig 
ure 106 gives a similar anal>sis for the means of the rows 

^ See pp 330 and 3o8 



CHAPTER XV 
NONLINEAR CORRELATION 


All the foregoing discussion has been concerned- -ttith those 
cases in \Yhich the progression of the- means is linear. In such 
cases it was found that r = 'Zxxx^/'Naxan was an appropriate 
measure of correlation. If the progression of means and the 
distribution of cases around the means Is as pictured in Fig. 113, 
however, r may show little correlation, especiidlj'- in such cases 
as A and C, although there may be a high degree of association 
betweftw variables. It Is tbe parpase of tbls ebaptei ta 
indicate ways of describing and measuring such nonlinear 
correlation. 

As indicated in an earlier chapter, the best way of studjdng 
any correlation is to make a bivariate scattei’ diagram of the 
data. If the data are numerous enough to be grouped into class 
intenmLs, then the means of the rows and columns may be 
computed and the variation in the means of each variable with 
changes in the other variable may be studied. 

In the linear case in which a line of regression was used to 
measure the association it was found that, the smaller the 
scatter, the higher the degree of correlation, the equation being 


= 1 - 



The same sort of formula may be used to measure the degree 
of relationship indicated by the progression in the means. To 
distinguish them from the correlation coefficient these measures 
are called “con-elation ratios” and are defined by the formulas 


■nil = 1 

nix = 1 


O’! 





( 1 ) 


/ 


where Xc repr&sents the means of Xi for various values of Xs, 

365 



300 bTUDl Of BIT, IRIATliS A\D MULlIVARIAlES 


Xt repie&ents the means of A* for \anous values of .Yi, and 
represent the sum of the squared deviations 
around the means pooled for all the column or row means and 



dn ided bj N’ 


ihus 




N 

XX(A„- \\)* 


Ihe conelation latio-i ij]j md ij|i give some indication of the 
degree to nhich the moans of one o inablc are successful m 






XOXLLVEAK CORRELA TIOX 


307 


measuiing the variation in the other variable. They may be 
n.sed to measure either linear or nonlmear correlation. 

If the means of one variable seem to mark off a definite cursm 
or if in the case of ungrouped data a bivaiiate chart indicates a 
fairly definite form of nonlinear variation, then the average 
variation in one vaiiable with changes in the other variable may 
be indicated b3’- drawing a smooth curve or fittiug one b^' some 
mathematical process, such as the method of least squares. 
Such a curv'e might be called a “cun^e of regression.” A lin e 
of regression on the one hand indicates the average change 
in one variable v-ith a unit change of the other variable; this 
average change is the same for all values of the independent 
vaiiable, since the slope of a straight line is constant. A curve 
of regression on the other hand gives the average change in one 
variable vith a unit change in the other variable; but this average 
change Amries from one A'alue of the independent variable to 
another, since the slope of a curve changes at each point. The 
technique of fitting a eun'e of regression will be discussed in a 
subsequent section. 

To measure the degree with which a curve of regression 
measures the a.ssociation between two variables, an index of 
correlation is defined in a manner similar to the definitions of 
r and ij. It depends on the closene.«s with which the various 
cases are scattered about the curve and is defined b}' the formula 

72 1 ^,Vi— Cj2 

i 12 — ^ *> 

r<» -I ^St—Czi 

(tI 

where Cu and C'.i refer to the regression cur\'es, refers 

to the variance of the deviations from the curve of regression of 
Xi on X2, and refers to the Amriance of the deAuations 

from the curAm of regression of Xi on Xi. 

Although ?*i2 = Til, the two correlation ratios and the trvo 
indexes of correlation are not necessarily equal. That is, 
J7i2 ^ 1721 and /12 In- la addition, ij > / ^ r. 

Since the vaiiance about the means or about a curve is neA^er 
greater than the total variance, these formulas alwaj-s give a 
positiA'^e A'alue and their square roots are indetenninate as to 


' 

( 2 ) 

j 



3b8 STUDi Ot nn l«/13i6 i\I) MUUJVl7iHl£.'> 


feign The ‘■quire roots of ri* and /* gi\e an mde\ of correlatioa 
and the question as to whether it is> » poMtnc or ncgatuc rcli- 
tionship mu-t be answered b> reference to a correlation tible 
or a figure showing the or curie of regression In 

the case of cunnlinear correlation, the question of positiie or 
ncgatiic relationship often is irrelevant bccau.-e two vanablcs 
ma> be pa^itiicb correlated up to a certain point and then 
ncgativelj correlated be>ond that point CoiLCquentlj, it 
becomes necesaarj to desenbe the entire relationship For 
eianiple, the death rate due to puerperal septicemia is corrclatcil 
inth ages of the female population la a nonlinear manner The 
relationship between the two is best dcscnbcd b^ a curve or 
pohgon of rcgro-'feion which would have to be ‘<cu m lU cntuet> 
if the relationship is to be completed understood This is lUih- 
trated in Fig 55 (page 151) If r merelj were calculated, it 
might conceivablj be zero when there is in fact a close relation* 
ship Vu mde\ of such a relationship is found b> the calculation 
of the ij 3 or the / & 

Calculation of the Correlation Ratio 1 lie calculation of the 
carrclation ratio will be illustrated bj refeieuce to the Mount 
Holv okc data in Tabic 3-1 (page 358) \lthough the relationship 
appears to be linear it is worth while to compute the correLition 
ratio to see how close it comes to r If the difference is not verv 
great, (he hneantj will l>o numcncallj demonstrated 

Equation (1) for the rorreJalion ratios mav bo put in the form 



vh = 




where a, , and cr,,, are abbreviated expressions for cx~x. and 
(Tx~x and thus represent the average standard deviations around 
the means of the columns and the means of t lie row s, rc&ixictiv tlj , 
as explained abov e Inordertoapplj these equations for finding 
the correlation ratios it is necessarv to find the v alues of a J,, and 
This can be most convcnientlj done with the help of a 
work bhccl that makes us; of arbitrary origins ( li and Ij) and 


class-interval deviations 


( 5-3 


Such a work sliect is 


Table 35 in which the computations are earned out for the data 



N ON LI X EAR CORRELATION 


3G9 


of Table 34. The algebraic foundation for these computations 
is as follows : 

It is assumed that the .same At is used for everj' column as 
for the total frequencj' distribution of Xi. Then for each 
column the sum of the squares of the deviation from the column 
mean would be^ 



1 


For the sum of all “ m” columns this would be 

m Rc .V. 

I; 2 ^‘1 -f . , (S^r:Y 


1 1 


But 


1 L 1 
m Sc S 




1 1 
m .Y« 


(4) 


and by definition ^ ^ {X,c — X,)- = iV(r|i, 2 - 
1 1 

Therefore, 

- ihff 

1 1 

It has been determined already that- 

=^y F ^ 

iThis follows from Eqs. (1) and (2) of Chap. VH. For it wiU be noted 

-•-Hf)'/-’'’ 

and v[ = 2F (j) / X and /. = SEx'-ZN. 

-Sec Eqs. (1) and (2) of Chap. VII and previous footnote. 


(5} 



370 &2UDYOt BlViKIAfi.S AND MULUVAlirATES 


Each of the vanances m Eq (3) may be expressed m Hnss 
interval units so that its numeiator is the arithmetic difference 
between Eqs (5) and (4) and its denominator is Eq (5) Thus 




(x-ey (x^^y 


xK^y- 




Similarh it can be shown that for a table with ' I ’ rows 


.{I'ti (S' S' 


Nr 


.(S'S 


w 


^11 the items m these two formulas (6) and (7) are to be found 
on the work sheet in Table 34 with the exception of 


N V 



These two figuics aio obtained fiom the correlation ratio work 
sheet (Table 35) 

In Table 35, the frequenej is placed in large t\pe m the center 
q( a, cell Each. coIvlto.w is wow rsga.Ki!e.d as a separate feequeney 
distnbution whose total numbei of cases Nt is sho%vn m the 
row headed Nc boi each column the same arintrarj origin 
{At = 190) as that used in Table 34 is used, hence the same 
di/ti can be used for each colunm 

For all II columns m the upper nght corner of each interval 
that contaias a frequenev is a number in small type representing 



rABLK 35. — (./OUKIJLATION-KATIO WOUK SuEKT 



ll.sooo O.MOO I 0000 33.9231 U7. 6550 09.0000 




































































372 STUL>i uh iin iitmn, i\D MiLiix miubi, 


tht f — for that inf (.r\al of the column 1 hi sc arc Ihi n MiinmcJ 


gning for each column ^ is «hoi\u 

t 

> 

m the roM «ith the stub title 1^ catU sum Lsduuksl 


b> the number of cases m the column iVe and muUiphoii by t, 
the resulting number is the correction (dctor C* for that column 
Accordingly, the mean for that column {\») can be found h\ 
using the formula + C, The results of tins cnlcula 

.. _ *..,.(1) X‘t 


tion are shown m the row with the stub title — ^ and the 
column means are shown m the row with the stub heading 
In order to obtain the figure to be u&cil m the formula for the 
correlation ratio— that i-s, for the square root of ij*,— another 
row of figures is now addcil to Table 35, this set of figure? con 
,M / 


il'S 


/A« for each tolumn, and when tlimj 


are summed for all columns (saj for “m ' column^), the rcsullmg 
figure m as follows 


.(?;sy 


Lsing Fq (ti) tht correlation ratio of \i on \'j lo tlms' 
473 1215 - 

473 1215 - 152 illl 321 OlOd 
5d3 0000 - 152 nil “ 31)0 8889 
= 0 82123 
ij,. = 0 9062 


1 The \ allies of are found m Table 

In that tat Ic the &aine 1, was wed (or the frequency distnbuti irof \i 



NOXLLVEAIt CORREIA TIOX 


373 


To calculate the Eaeaus of the rows and the correlation ratio 


of X 2 on Xi, every row of Table 35 is treated as a sepai-ate 
frequency distribution. The same A 2 is used for each row as 
the As in Table 34 for the entire Xs dLstribution. Accordingly, 

the same set of ds/is may be used for each row. The F ^ for 

■^2 

each interval of each row is placed in small tjq)e in the lower 
right corner of the intersml. These summed for each row give 


the 



shown in the column "with that title heading. 


From 


1 

these are obtained the C'r for each row, bj' the same procedure’ 
as that used for finding the column means. For each row, the 


/ 


( 2 ) ^ ^ then computed and entered in the column 


with the title (ly/Nr. The sum of these for all row frequency 
distributions (say I rows) constitutes the aggregate 


.v. 



i 


436.2445 


This is the value required by Eq. (7) for finding 7721 . Thus,^ 


436.2445 - 


vh 


7721 — 


(57)^ 

81 


493 - 


81 


436.2445 - 40.1111 
493.000 - 40.1111 
0.87467 
0.9352 


396.1334 

452,8889 


The Correlation Ratio and Analysis of Variance. The square 
of the correlation ratio is a measure of the proportion of variance 
due to correlation, in the same manner as it was indicated that 


» The values for V F = 57 and £ F ‘ = 493 are found in Table 34. 

^ I2 \ti/ 

In that table the At is the same as the At used in the present table. 



374 STUDY OF BiVARlAlbS A'VD MULTIV \IUA7bS 


the squaie of the coefficient of conelation is a measure of pro- 
portion of vanance due to correlation 
As has been explained, when expressed in the form rV| » a* 
the square of the coefficient of conelation leveab itself as the 
proportion of the total vanance that is due to correlation or 
association wath Xt as measured by the line of legression of A’l 
on Xi In a similar mannei, i}*^* =» and likewise 
The square of the correlation latio thus describes the proportion 
of the total variance that is due to correlation as measured by 
the fluctuations in the means of the columns and rows The 
standard deviation of the means of the columns squared is the 
vanance that is due to correlation of Xi with A'* and siimlarl} 
for the correlation of X* with A'l 

To demonstrate algebraicallj that = a*. = aj — it 
IS necessary first to note tliat by definition the mean of tlie 
weighted means of the columns equal A’l By definition, 

N. 


A% =* 


N, 


and thus 


NcX, 



which if summed for all columns, becomes 

(8) 

I I I 

But 

»• N. 

2;2x. = sx. 

1 1 

ind hence, if Eq (8) is divided by N, it is equivalent to 


XX. 

N ^ N 

which was to be proved 

If Xi, the mean of the entire Xi distnbution 
as the arbitrary ongm, 

Xc = Xi + C. or a = x. - 


( 9 ) 


IS now selected 
A, 



NONLINEAR CORRELATION 


375 


Also, when the mean of the entire distribution is selected as 
the arbitrary origin for each column, the standard deviation 
of the. column is found by 

N. 

Cl [xi = {X, - x^y^ 


N. 


On substituting Cc = Xc — Ai and transposing, an expression 
for each column similar to the following will result; 


(a) 


I Ne 

X 

-^ = a; + {X - = 


Multiplying the equation for each column b.y its Nc, respectively, 
will result for each column in 


(b) 


Nc 

V' 


X xj = + NciXc - A,)= 


When the whole series, one for each column, of equations such as 
(b) are totaled, the following result is obtained: 


'tt Nc 


(c) 


X X -^1 = X + X 


1 1 


But, in this equation, 


tn Nc N 

X X = X = ■^‘^1 


1 1 


and 


%na = n<tI., 


Moreover, by definition, the explained variance, that is to say, 
the variance of the means of the columns about the weighted 
mean of these means, is as follows: 

‘ m 

Wffi = SX(X - XrY ^'ZNciXc - x,Y 


Consequently, (c) may be written 

Na\ = Nal_, + iV4, 



370 STUDY at BlViltliTtS iVX) \IULTI\ UtlATES 


i X I * 

ff, = ff, j -t* 

Substituting the •\alue of ffj , = aj — (3) for tlio 


correlation ratio gives tlie fulluvviog 

= nil'll 

Similarly, it can bo shown that 

Ot 

Calculation of Curvilinear Regression To illusti ate the statis 
tical problem inv olv ed in curvilinear regression and the calcuU 


( 10 ) 


( 11 ) 


lABLi. 36 — Stocks pnoocrenov, avo lupoina or Cotto\ vnd Pwcb 
or Cotton Reclived ar Prodoclrs iv tub Unitep States 
Slocks ot beginning of crop year plus years production plus net imports 
Prices are deflated by United Stales index of leholcsole prices for crop y ars 


\«4r 

•^ers(« pi c* 

r«a(« per poun<] 

T 

Btoek* 

pinlucl t r) 

bU^on bklM 

Y. 

1920-1921 

13 47 

1 726 

1921 1922 

18 OG 

1 480 

1922 1923 

22 63 

1 306 

1923-1924 

29 30 

1 274 

1924-1925 

22 63 

1 ooO 

1925-1926 

19 19 

1 805 

1926-1927 

12 92 

2 193 

1927-1928 

20 95 

1 711 

1928-1929 

IS 71 

1 749 

1929-1930 

18 34 

1 7o5 

1930-1931 

12 13 

1 862 

1931-1932 

8 38 

2 378 

1932-1933 

10 30 

2 307 

1933-1934 

14 01 

2 162 

1934-1935 

lo 76 

1 76a 

1935-1930 

13 83 

1 815 

1936-1937 

14 48 

1 821 

1937-1938 

10 29 

2 3Si 

1938-1939 

11 17 

2 383 


1 



xoxuxij.m conuELA riox 


377 


(iou of the correlation index I, data on cotton .stocks, production, 
and imports compared with cotton pricc-s, 1920-1930, have i)Con 
selected. The^'^ are .shown in Table 30 and plotte<l in Fig. 114. 

The position of plotted bivariaie-s in Fig. 114 .sugge.sts that a 
cur\'e .such as aa' might fit the data. The question of the type of 
curve fitted is of particular importance in curvilinear regression 
and accordingly three type.s will be diseu.sscd for ilhrstrative 
purpo.sps. 



Fio. 11-1. — Bivariate scatter diagram and fitted curve siiowiiig rel.ai<iudiii» 
between the price of cotton and the supply of cotton. 

Logarithmic Regression. The constant .slope of a straight 
Ime depicts the fact that the change in Xi is constant for a 
given quantity of change in Xz, and vice versa. The changing 
slope of a cuiwe depicts the fact that change in A'^i varie.s for 
different values of Xz, and vice versa. One such curt'ilinear 
relationship between A'l and A^; is as follows: 

XiXi = k ( 12 ) 

In Eq. (12) the 'varying manner in which A'l fluctuates with 
respect to A- depends on the exponent b. If b Is larger than 1, 
a .small change in A'; must produce a large change in A'l becau.'C, 


378 sTum OF niv\KiiiF'> i\n ^funivu{ii2L.‘i 


as the equation indicates, their product (when Xt is raised to 
(he b power) is constant If & is equal to the changes in Xx 
must be just proportionate (in an in\ er^e manner) to the changes 
in Xi If b IS le&s than 1, the changes m Xj must be proportion- 
ately less than the changes m X* If such an equation is used 
to dcacnbe the line of regression of pnee of cotton on stocks 
and pioduction of cotton, a \erj flexible price of cotton will 



III IJo — The rel&tioDkhip of Fig 114 m loganthmio form 


lesult in a value of b larger than 1, a aeiy inflexible price of 
cotton will result m a value of b less than 1 The nature of 
Eq (12) assumes that the flexibility in the puce of cotton lemains 
the same regardless of stocks and production, because it sets up 
the hypothesis that the product equals a constant 
If such an equation is assumed to be suitable for the problem 
in hand, the fitting of the curve of regression may be simplified 
by first transforming the equation to its loganthmic form, namely, 

log Xj -b b log X> = log K 

= a if log A. = a (13) 


NONLINEAR CORRELATION 


379 


Figure 115 shows the effect of transforming the bivariate fre- 
quency distribution from original units to logarithmic units. 
The data plotted are the same as the data plotted in Fig. 114, 
except that, in Fig. 115, the Xi and X 2 scales refer to the log- 
arithms of Xi and X 2 . WTien the bivariate logarithms showm in 
the first two columns of Table 37 are plotted in Fig. 115, a straight 

T^blb 37. — -Logaeithms op United States Peodtjctiox, Stocks, and 
I jrpoETs OF Cotton and or the Price op Cotton Received bt Pbodtjcehs 
IVi/h columns for the squares of the logarithms and their cross products 
Xi = price of cotton 

X2 = stocks, production, and imports of cotton 


log Xi 

log X. 

log Xi log X; 

log: Xi 

log!^ Xt 

1 . 1294 

0.2370 

0.2677 

1.2755 

0.0562 

1.2567 

0.1703 

0.2140 i 

1.5793 

0.0290 

1.3547 

0.1159 

0.1570 

1.8352 

0.0134 

1.4669 

0.1052 

0.1543 

2.1518 

0.0111 

1.3547 

0.1903 

0.2578 

1.8352 

0.0362 

1.2831 

0.2565 

0.3291 

1.6464 

0.0658 

1.1113 

0.3414 

0.3794 

1.2350 

0.1166 

1.3212 

0.2333 

0.3082 

1.7456 

0.0544 

1.2721 

0.2428 

0.3089 

1.6182 

0.0590 

1.2634 

0.2443 

0.3086 

1.5962 

' 0.0597 

1.0839 

0.2700 I 

0.2927 i 

1 . 1748 

0.0729 

0.9232 

0.3762 

0.3473 

0.8523 

0.1415 

1.0128 

0.3631 

0.3677 

1.0258 

0.1318 

1.1474 

0.3349 

0.3843 

1.3165 

0.1122 

1.1976 

0.2467 

0.2954 

1.4343 

0.0609 

1 . 1408 

0.2589 

0.2954 

1.3014 

0.0670 

1 . 1608 

0.2603 

0.3022 

1.3475 

0.0678 

1.0124 

0.3769 

0.3816 

1.0250 

0.1421 

1.0481 

0.3771 

0.3952 

1.0985 

0.1422 


5.0011 

5.7468 

27.0945, 

1.4398 


line fits the points. Thus the logarithmic transformation has 
converted a curvilinear correlation problem into a simple linear 
correlation problem in which the Pearsonian coefficient of 
correlation is nogx, loex, and the line of regression of log Xx on 
log X 2 is as follows; 

log Xi — mean of log Xi = ru>e xi loz a-s (log ^2 

® crjog xi 

— mean of log X 2 ) 







380 STUDY OF BlVAIilATES AUD StULUVAillATBS 


The equations of regression could be obtained in the above form 
and then transformed into their antdoganthmic form, but in 
this problem it is moie convenient to find the regression equation 
directly from the least-squares equations Accordinglj, the 
regression statistics a and b of Eq (13) maj be calculated by 
using the folloivmg leastr^quares equations * 

2 log -Y, = JSTo -f fc2 log Xi 
2 log Xi log Xi = a2 log Xj + 62 log^ Yj 
Table 37 is a uork sheet providing columns to calculate 2 log X\, 
2 log Xi, 2 log Xi logXj, 2 log* Xj, and 2 log* Xj, using the 
data of the cotton problem for which the raw data are found in 
Table 36 The first two columns of Table 37 show the logarithms 
of the pnee of cotton m the United States and of the stocks, pro- 
duction, and imports of cotton The third column contains the 
cross products of the logarithms The fourth and fifth columns 
contain the squares of the logarithms m the first two columns 
The sums of the columns provide the values that are required to 
find the regression statistics a and h, for Eq (13) 

Calculation of the icgression of log Xi on log Xa 
. 22 5405 = I9a + 5 00116 

5 7408 = 500110 + 143986 

In order to solve, eliminate a by multiplying the second equation 
by 3 7992 and subtract it from the first, as follo\%s 
22 5405 = 19a + 5 00116 
22 8332 = 19a + 5 47016 
0 7073 *= -0 46906 
6 = -I 5081 

Substituting this \ alue of 6 in cithei of the equations will show 
that 

0 = 1 5833 

Accordingly, the equation «/ logaiithmic i£gie*>ion oflogXi 
on log Xj IS as follows 

log Xi = 1 5833 - 1 5081 log Yj 
ivhich maj be transformed into ontiloganthmic form as follows 
X,Xi-««w> = 3831 


bee p 333 



noxlinf.au court el a tion 


381 


Reciprocal Regression. Reciprocal regression Ls a special 
form of the type of regi-ession indicated by Eq. (12); iorUh = 1, 
changes in Xi are related reciprocally to changes in Xi. In 



Fig. IIG. — The relationship of Fig. 11-1 in reciprocal form. 


other words, the equation becomes 

X 1 X 2 = k' or 4- = *'^2 (1^) 

which, placed in a more general foi-m, is as follows: 

= a -f- bXa (15) 

If the reciprocal of each Xi Ls found, it is pos.sible to find the 
equation for the reciprocal regression by fitting a straight line 
to Xi and the reciprocal of Xi, that is to say, by fitting an equa- 
tion such as (15), Figure 116 shows the effect of transforming 
one of the variables of the bivariate frequenc}”- distribution from 
original units to reciprocal units. In the figure the vertical 



182 STUDl at an IVJ> MUL1I\ UltAlts 


''talc 15 VVi while the honzontil i^^lc rcmaiiuj \i When the 
bivanates shown m lahio 38 arc plotted in Tig 116, a straight 
line fits the points 'Ihiii the reciprocal iranslormitioii has 
con\ cried a problem in cutaiiincar correlation into a problem 
in simple linear correlation m which the Pcarhonian coefficient 
of roiTclation is and the line of regression is 'is follows 

I 

f - \jL = ri„-£-(\,- to 

A I ‘ as 

Tablv 38 — Lmted SriTts Supply o» Cotton and thl Reciprocal oi 
THE Price or Cottov Rlcliiad bt PuoDurEas 
II tiA coJumtit Jar ihe aquarts and the crota produeia 
Xt pnc« of coiton 
At “ supply of cotton 


' 

1 

\ 

Bi 


J 

\i' 

1 720 1 

0 07421 

0 12814 

2 97908 

0 OOoal 

1 480 

0 Qo537 

0 OSlOo ' 

2 19040 

0 00307 

1 m 1 

0 04419 1 

0 0o77l 1 

1 "OaOi , 

0 0019a 

1 274 1 

0 03413 ' 

0 04348 

1 62303 

0 OOllO 

1 SoO 

0 (H410 

0 06840 

2 402a0 

0 00190 

1 80a 1 

0 0a2ll 1 

0 09406 

3 25803 

0 00272 

2 10a 

0 07740 

0 16989 

4 81803 

0 OOaOO 

1 711 1 

0 04773 1 

0 03167 1 

2 92752 

0 00228 

1 719 

0 0a315 

0 09318 1 

3 OaOOO 

0 0028C 

1 75a 1 

0 0o4o3 1 

0 09a70 

3 0S003 

0 00297 

] 802 1 

0 08244 

0 153a0 

3 40704 

0 OOCSO 

2 378 

0 11933 

0 28377 

5 05483 

0 01424 

2 307 

0 09709 

0 22399 

5 3222a 1 

0 00913 

2 102 

0 07123 

0 lolOO 

4 67424 

0 00a07 

I 705 

0 00345 

0 11199 

3 11523 

0 00403 

1 Sla 

0 07231 

1 0 13124 

3 20123 

0 00a23 

1 821 

0 00900 

[ 0 12576 

3 31004 

0 ooirr 

2 382 

0 0971S 

0 23148 

5 67392 

0 00914 

2 383 

0 089a3 

1 0 2133o 

1 a G78C9 • 

0 00S02 

i, - 35 42b 

1 29S9C 

2 54365 

68 239S3 

0 0974 J 


The equations of regression could be obtained in the above form 
and then transformed into tlicir ongmal imils, but in tlas 
problem it la more convement to find the regression equation 
directly from the Icaat-squarcs equations The normal least- 





NONLINEAR CORRELATION 


383 


squares equations are as follows: 

Table 38 is a work sheet with columns in which the required sums 
are obtained. Entering these sums in the above least-squares 
equations makes it possible to evaluate the regression statistics 
a and h for Eq. (15). 

Calculation of the regression of 1/Xi on X'A 

1.29896 = 19a + 35.42605 
2.54365 = 35.4260a + 68.239835 

Multiplying the first equation by 1.8645263 and subtracting the 
result from the second equation eliminates a and gives a solution 
for 5 as follows : 

5 = 0.05564 

Substituting this value in either equation gives the solution of a 
as follows: 

a = —0.03538 

The equation of regression Is therefore as follows: 

4- = -0.03538 + .05564.1% 

-V I 

This equation 'describes the straight line plotted in Pig. 116. 
Plotted on scales of Xi and X->, the equation is a curve. 

Parabolic Regression. The cm-vilinear relationships so far 
considered have been relationships that could readily be trans- 
formed to a linear form, by taking logarithms or reciprocals. 
Such transformations reduced the problem to one of simple 
linear correlation between the transformed variables, and there 
was little in the analysis that was different from that of the 
previous chaptei-s. A cuiwe that cannot easily be transformed 
to a linear form is the parabolic relationship 

.Yi = a -f hiXl + b^Xl 


( 16 ) 



384 STUDi OF Bn iRIiTBS i\D MULTIV iRIAI Ei, 


ThLs muftt be fitted directlj Fortunately, the nature of 
the curv e is such that the method of lea&t squares can be used 
Accordmg to this, to fit a parabolic regression the least-squares 
equations are obtained as follows 
The least-squares criterion » that 


or 


wd* = i(Ai — \i)* = minimum 


i(Ai — ft — 6|\* — &iA|) =* minimum 


For this to be a minimum its total diffciential should be equal 
to zero, that is, differentiating with respect to a, bt, and bt and 
setting equal to zero the follow mg normal equations are obtained 

2X1 = Xa "h tiiAi “j" ijSAj 

iXxX-i = ftS-Y, + bi2\l + bi2\l 
2X1X1 « oSA| + 6,SA| + h 2 \i 


Table 39 la a work sheet pioviding for the calculation and 
checking of the sums entering into the three parabolic equations 
of regression Using the sums of the appropnite columns 
the following set of equations is obtained for the calculation of 
the legi-essiou btatcsticsa bi and hj for the regression of Yi on A j 
shown m Eq (16) 

306 58 « lUa +35 426ii + eS2398i- (l) 

542 7359 ■= 35 426a + 68 23985, + 135 4744b, (11) 

994 4092 = 68 2398a + 135 4744bi + 27G 3974b. (Ill) 


The solution of three equations for three unknowns should be 
undertaken m an orderly manner, this is attempted in Table 40, 
ivhich 13 a work sheet following the so-called Doolittle method 
This work sheet provides a step-by step check on the calculations 
as the solution of the equations proceeds la order to aioid 
copjnng a, bi and 6? each time an equation is written down, a b, 
ind bi arc written as the titles of columns in which their coeffi 
cients are entered In the table onlj the coefficients are entcied 
m their respecti\ e columns w ith the proper sign before each figure 
For example, row (1) of the table is presumed to read as follows 


19a + 35 4260b, + 08 23986, - 306 58 « 0 


which IS the first equation aboxe with slight rearrangement of 
terms 



TvuiijL 39 — Umtlu SrAriJb Supply or Coiton and mu Phicl oi' ConoA Hlolivud b\ PnonuciJis 
II dll ioUiiinis fo! tiu second, third, and fourth poweis and for the necessary cioss products to fd paiabolte legussions 

X 1 = price of cotton 
-V; = supply of cotton 


XONUXliAR COUUELATruX 



385 


OC^ 

•o e» 
C*t- 


ci I lef 
J fl 1} 9 II 


+ 


rl'I 

J-+ 






•10— DooLmiL WOIIK Sin n kou Cauulatino 'fiiuLi lUonEHhK.v SrArisTics mu C iumiim au Count lation- 
lifgreanoH of Xi on \t 


386 


STUoy OF DIVARIATES ASD MVLTIVARIAI'ES 



fill 







NONLINEAR CORRELA TION 


387 


Three steps are involved in solving three equations for three 
unknowns: (1) to get an equation in the three unknowns in which 
the coefficient of one of the unknowns is unity, (2) to get an 
equation in only two of the unkno%vns in which the coefficient 
of one of the two is unity, and (3) to get an equation in only one 
of the unknowns in which its coefficient is unity. 'V\ffien the 
third step is accomplished, the value of the third imknown is 
obtained. This value, applied in the equation obtained by the 
second step, makes it possible to evaluate the second unknown; 
and the third unknown is then obtained by applying these two 
values in the equation obtained by the first step. This is the 
same process as that used for finding two unknowns from two 
equations. 

Table 40 provides an orderly procedure and also a check 
for these steps. The finst step is accomplished in row (2) of 
the table, by multiplying Eq. (I), copied in row (1), by the 

negative reciprocal of the coefficient of a, that is, by this 


will make the coefficient of a become —1. The second step, 
rows (3) to (6), eliminates a from two of the equations in order 
to obtain in line (5) an equation in hi and bz. In order to 
eliminate a, the first equation must be divided by its own coeffi- 
cient of a and multiplied by the coefficient of a of Eq. (II); in 


other words, Eq. (I) must be multiplied by 


-35.4260 

19 


The 


multiplier is given a negative sign so that, when added to Eq. (II), 
the a term will cancel. Row (6) divides row (5) by the negative 
reciprocal of the coefficient of 6 1 in row (5). The third step, rows 
(7) to (11), accomplish the elimination of two of the variables, 
ending with an equation in only one of them, which of course gives 
its value. In order to do this, Eq. (Ill) is copied in row (7) ; 
Eq. (I) is multiplied by a number that will give it a coefficient 

. ^ -68.2398 

of a equal to the coefficient of a of Eq. (Ill), that is, by jg > 


and this is entered in row (8); then the equation obtained in 
row (6) (in terms of only hi and bz, a having been eliminated) is 
multiplied by a number that, combined with the two coefficients 
of bi in rows (7) and (8), will give a sum of zero. The sum of 
rows (7) to (9) will then eliminate both a and bi, giving in row (10) 
such an equation. When row (10) is multiplied by the negative 



388 STIDI OFUnAKlllhi) l\i> MULTJ\ iRI ITEb 

reciproc-il of its coefficient of bt the ^ nlue of 6j is obtained this 
IS «homi in roiv (11) 

A column for sums is proMded m order to obtain a step-bi 
step check, on all calculations This is done bj applying to 
the sums the same multiphers as those applied to the equations 

hor example the sum of row (1) muitiphed bj should equal 

the sum of row (2) In the column headed Checks arc entered 
the products obtained by multipliong the sums as indicated 
under Remarks to Msualtze (he checks 

From Table 40 the aalues of a 6j and h are obtained from 
equations m rows (2) (6) and (11) os follows 

Row (2) -a - I 8(y52Chi - 3 59l57t* + 16 1358 « 0 
Row (6) -6i - 3 707335* - 13 2095 » 0 

Row (11) -b + 7 9944 = 0 

6,- 7 9944 

t>, = ~3 76733(7 9944) - 13 2095 
43 327 

a - 43 327(1 804526) - 7 9944(3 59157) + 16 136789 
= 68 2077 

Ihe equation of icgiession of \i on \* is therefore as follows 
X - 68 20<7 - 43 327\i + 7 9944\* 

EsUmalea Based on Ueg eaaion hquaiwi s U-'ing the tluee 
equations of regicssion calculated above foi the regression of 
\i on A* that is for the icgicssion of the price of cotton on 
production stocks and imports of cotton m the United States 
estimates mai be made of the price that will result from a gnen 
volume of stocks plus production plus imports Ihe tl rcc 
equations arc as follows 

Logarithmic regression log \, — I 5833 — 1 5081 log \i 
Reciprocal regie ion — — 0 03538 + 0 0oo64\2 

Parabolic icgrtssion Yi = 6S 2077 — 13 327\ + 7 9944\j 
To illustrate the method of estimation suiiposo the questions 
aie asked IVhat is the expected pneo of cotton if the cotton 
stocks plus the j ear s pioduction and imports amount to 2o mil 
hon bales? IVhat is the expected pnee of cotton if the cotton 



NONLINEAR CORRELATION 


389 


stocks plus the year’s production and imports amount to 22 
million bales? 19 million bales? 16 million bales? 13 milli on 
bales? Only 10 million bales? How much higher will the 
price be in a year of shortage than in a year of large carry-over 


Tahli, 41.— EbTurATES OP Coiton Prices Based ox Three Regression- 

Curves 

Estimates based on logarithmic regression 


Values of 

1 

loK -V: 

Kquatiou of estimate ! 

1 1.5S3.1 - 1.5081 log Jf, = log. Yt 

I 

log .Y, 

Ebtiluate of 
-Yi 

2.5 

0.39794 

1.5833 - 1.5081(0.39794) = * 

0.98317 

9.62 

2.2 

0.34242 

1.5833 - 1.5081(0.34242) = 

1.06690 

11.67 

1.9 

0.27875 

1.5833 - 1.5081(0.27875) = 

1 . 16292 

14.55 

1.6 

0.20412 

1.5833 - 1.5081(0.20412) = 

1.27547 

18.86 

1.3 

0.11394 

1.5833 - 1.5081(0.11394) = 

1.46612 

29.25 

1.0 

0.00000 

1.5833 - 1.5081(0.00000) = 

1.58330 

38.31 


Esdnmles based.on recivrocal regression 


Values of 
-Y: 

Equation of estimate 

“ -0.03538 + 0.05504.YS = ~ 

-Yl : 

1 

-Vi 

Estiuiatc of 
A'l 


-0.3538 + 0.05564(2.5) = 

0.10372 

9.64 


-0.3538 + 0.05564(2.2) = 

0.08703 

11.49 


-0.3538 + 0.05564(1.9) = 

0.07034 

14.22 

1.6 

-0.3538 + 0.05564(1.6) = 

0.05364 

18.64 

1.3 

-0.3538 + 0.05564(1.3) = 

0.03695 

27.06 

1.0 

-0.3538 + 0.05564(1.0) = 

0.02026 

1 

49.36 


Estimates based on -parabolic regression 


Valutw of 
-Yj 

Equation of estimate 

68.2077 - 43.327XJ + 7.0044X2^ - Xi ' 

Estimates of 
A'l 

2.5 

68.2077 - 43.327(2.5) + 7.9944(6.25) = 

9.85 

2.2 

68.2077 - 43.327(2.2) +7.9944(4.84) = 

11.58 

1.9 

68.2077 - 43.327(1.9) +7.9944(3.61) = 

14.75 

1.6 

68.2077 - 43.327(1.6) +7.9944(2.56) = 

19.35 

1.3 

68.2077 - 43.327(1.3) + 7.9944(1.69) = 

25.39 

1.0 

68.2077 - 43.327(1.0) +7.9944(1.00) = 

32.87 


and large production and imports of cotton? Table 41 shows 
how these estimates are made by using the above three equations 
of regression. When the year’s cotton stocks, production, and 
imports are large, the three regression equations give results 














390 STUDY OF BlVARiATBS AND MULl IVARIATES 


that are approximately equal to each other, but when the j ear’s 
cotton stocks, production, and imports are small, the estunates 
based upon the three regression equations dilier sharply from 
each other 

Ftrsi-order Standard DevtaHon Used as Standard Error of 
Estimate The dispersion about a curve of regression can be 
measured m the same manner as the dispersion of cases about a 
progression of means or a bne of regression The measure 
generally used is the standard deviation and is called a “ first- 
order standard deviation” or a “standard error of estimate,” 
because it is the standard deviation of the residuals about the 
curves of regression by means of which estimates such as those 
illustrated in Table 41 are made 
For the illustration in nhich cotton stocks, production, and 
imports aie correlated with cotton pnees compared with cotton 
pnee correlation, three types of regression lines hav c been fitted, 
as follows 

log X; « c + 6 log Xi U) 

^ = a -I- “ (B) 

Y; = a + 6,Xs + btXl (C) 

The standard error of estimate, being a standard deviation, is 
defined as fonois*s 

Na\ j * Xd* (17) 

where each d is defined, taking regression type (C), for example, 
as 

d = - XJ = X. - tt - 6iX* - bat (18) 

Hence, each d* mil be as follows 

d* = d(a - biXj - tsXj) dX, - od - la^d - baU (19) 
If all these d*’s are added, the following result is obtained 

Xd* = sXid - ttSd - baXid - bjXX^ (20) 

By the least-squares condition, however, the last three terms 
of Eq (20) are equal to zero, for* 

Sd = 2(Xi - a - 6iX, - 6,X?) = 0 
XX^d « 2Xj(Xi - o - 6,X2 - bjX^) = 0 

XXtd = 2XKX, - a- biXi - hat) = 0 

•Seep 384 



NONLINEAR CORRELATION 


391 


Therefore, Eq. (20) reduces to the following: - 

= SZid = SZi(Zi - a - 6iZa - 
= SZf - a 2 Zi - 61SZ1Z2 - bi^XiXl 


( 21 ) 


Accordingly, the formula for the square of the standard error of 
estimate is as follows; 


, SZf - aSZi - l)iSZiZ2 - i)2SZiZ? 
^ 


( 22 ) 


If regression type (B) were taken, it can be shown similarly 
that 


^2 

<^ 1.2 “ 




N 


(23) 


If the logarithmic regression equation is chosen, the standard 
error of estimate is found by a similar procedure to be as follows; 

_ S log2 Zi - aS log Zi - 52 log Zi log Z 2 
^1.2 

The values necessary to calculate these standard errors of 
estimate are available, respectively, in Tables 40, 39, and 38. 

Calculation of standard error of estimate: For the logarithmic 
regression:^ 

, 27.0945 - (1.5833)(22.5405) - (-1.5081) (5.7468) 

<^1.2 - jg 

27.0945 - 35.6884 + 8.6668 _ 0.0729 
19 19 

= 0.0038 
«ri.2 = 0.06164 

By using the ordinary formula for the standard deviation. 



(when the arbitrary origin is taken as zero), the necessary figures 
are found in totals of the appropriate columns of Table 37, and 

» The scatter formula for the logarithmic regression could be calculated 
by using the formula employed in the linear case, as follows: 

0^1,2 ~ *rj(l ?Tok loK :r 2 ) 

Since, however, the logarithmic r has not been calculated, it is simpler to 
use the formula based on the least -squares equations. 



392 SlUDY QJ BI\ UtlAlbB l\« MULflVAllIAlES 


It is found that • 

= 0 0187 
ff, = 0 1368 

For the reciprocal regression ‘ 

, 0 09749 - (-0 03538)(1 29890) - (0 055G4)(2 54305) 

19 

= 0 09749 + 0 04596 - 0 14153 ^ 0 00192 
19 19 

= 0 000101 
ffis « 001 

The standard deviation of l/Xj is found by using the folloiving 
formula 



The necessary \alucs are found in the sums of the appropnato 
columns of Table 33 

c\ * 0 00869 
yr 

, * 0 0932 

A 

For the parabolic regression 

5,451 3758 - 68 2077(306 58) - (-43 327)(542 7359) 

, -(7 9944)(994 4092) 

19 

^ 5, 451 3758 - 20,911 1107 + 23,515 1183 - 7,949 7049 
19 

105 6725 
19 

= 5 5617 
<Ti* * 2 3583 

Using the ordinal^ formula for calculating the standard deviation 
when zeio is taken as the arbitrary origin, 

a? = 26 5508 ffi = 6 1528 

* The scatter fonmila for the reciprocal regression could be calculated bj 
using the formula for the linear case, as fDllo\^s 

“i * = "ill — x) 

Since, however, the reciprocal r has not been calculated, it is simpler to use 
the formula based on the least-equares equations 



NONLINEAR correlation 


393 


Table 42 is a summary of the estimates of cottou prices made 
above, together with ranges of plus and minus one standard 


r.uJLE 42, I'.sTisiATEs Raxges op Twice the Stanuaro Uimou op 
Estimate vor CorroN Pricks Based oe Three Regression Curves 
Estimates and ranges, logarithmic regression 


Kstimated i 
loR of orico ! 
log Ai 

1 Hango 

logarithms 

1 

j 

I K^tiumtod 
‘ price 

r A' 1 , 

1 1 

i 

Uange of 

1 piice, antiIogu^ithnl^ 

1 

log A’l H- iri 5 

log -Vi — ^ 

0.9S317 

1.0 1481 

0.92153 

9.02 

11.19 

8.35 

1.06G90 

1.12854 

1.0052G 

11.07 

13.15 

11.29 

1 . 1 0292 

1.22456 

1 . 10128 

1 1.55 

16.77 

12.03 

1.27517 

1.33711 

1.21383 

IS SO 

21.73 

13.23 

1.4G012 

1. 52770 

1.40448 

29.25 

33.71 

25.38 

1.. 58330 

1.G4494 

1.521 06 ■ 

38.31 

41.15 

33.24 


Estimates and ranges, reciprocal regression 


I'itimfttoil 
reciprocal 
of iirico j 

XT 

1 

Uuage 

reciprocals | 

' i 

j 

lilatiluatcd ] 
price 1 

.Yi 

i 

1 

Uuugo of estimated 
pi ice, converted from 
reciprocals 

i 

1 

A. " 

0.10372 

1 

0.11372 

0,09372 

9.64 

S.79 

10.67 

0.08703 

0.09703 

0.07703 

11. 19 

10.31 

12.98 

0.07034 

0.08034 

0,00034 

14.22 

12.45 

16.57 

0.05364 

0.06364 

0.04304 

18.61 

15.71 

22.91 

0.03095 

0,04695 

0.02095 

27.06 

21.30 

37.11 

0.02026 

i 0.03026 

0.01026 

49.36 

33.05 

97.96 


Estimates and ranges, parabolic regression 


Kstiiiuvtod 

price 

A'l ! 

Standard error of 
eatiinnto 

Ai -r ifi 5 

Al — <7-1.3 

9. 85 

12.21 

7.49 

11.58 1 

13.94 

9.22 , 

14.75 

17.11 

12.39 

19.35 

21.71 

16.99 

25.39 

27.75 

23.03 

32.87 

35.23 

i 

30,51 


error of estimate. In the eases of the. logarithmic and reciprocal 
regressions these ranges arc converted into original units of 










394 hTUm Ot BlVinUThii IND MUITIV Utl iThii 


the data m order to show their fagDjficance The dillcrcates 
are notable For the lower levels of pnee, the reciprocal 
regression gives estimates with small standard errors of estimate, 
but for the higher price levels the standard error of estimate is 
smallest wth the parabolic regression Each of these methods of 
calculating regrcasion curves assumes that the variance in Xj 
IS the same for all subgroups of Xi associated with varjang 
values of Xt The logarithmic regression assumes that, when 
converted into logarithms the van-ince about the loganthrmc 
regression is equal at all pomta but that,, when coniertcd into 
antiloganthms, it will be laigcr for the higher prices The 
reciprocal regression assumes equal variance about the curve in 
terms of reciprocals but, when converted, the vanance about the 
higher prices is larger tlian the vanance about the lower pnccs 

The question suggests itself Which one of these three assumji- 
tions about the character of vanance about the curves of regres- 
sion best suits the data of the particular problem^ Tins question 
IS answered by determming which of the regression curves is the 
best fit for the data m question 

Correlation Index For each of the curves of regrtssion 
calculated in the previous section, a corresponding index of 
correlation will help to detcimme winch of the regression cuncs 
IS the best fit for the data The standard error of estimate 
measures the divergence of the bivanatcs from the curve of 
regression, the correlation mdex measures the goodness of fit 
of the curve of regression The indexes of correlation may be 
calculated by using Eq (2) 

Calculation of Indexes of Correlation For the loganthrmc 
regression 


„ 0 0187 - 0 0038 0 0U9 

crl 0 0187 0 0187 

« a7968 
Z« = 0 8920 


For the reciprocal regression 

„ 0 00869 - 0 00010 0 00859 

000869 " 0 00809 

« 0 9885 
lit « 0 9942 



NONU NEiUi CORRELA TION 


395 


Foi’ the parabolic regression: 

n = 26-5508 - 5 .5617 _ 20.9891 
26.5508 “ 26^5^ 

= 0.7905 
Xi2 = 0.8891 

The high correlation index obtained for the reciprocal regres- 
sion appears to indicate that the cotton supplj’- and price data 
for the period 1900 to 1940 are correlated in a reciprocal manner. 
It indicates that the sample data are fitted by the reciprocal curve 
of regression better than by either the logarithmic curve or the 
parabolic curve. 

It is to be noted that, in general, the use of the index of correla- 
tion to show which curvm is the best fit is valid only when all 
cuiwes have the same number of regression statistics. Here two 
curves had two regression statistics and one had three. A cunm 
with, a larger number of regression statistics will always give a 
better fit than a similar cuiwe TOth a smaller number of regression 
statistics. Here, however, the parabola that had three regres- 
sion statictics gives a woi*se fit than either the logarithmic 
or the reciprocal curve, each of which has only two regression 
statistics. 

The Index of Correlation and Analysis Variance. As already 
pointed out, in the cases of the logarithmic and reciprocal curves 
of regression, the Pearsonian coefficient of correlation may be 
calculated. "UTien transformed into original units, this coefficient 
of correlation becomes the index of correlation. In the problems 
above illustrated, however, the correlation index was calculated 
instead by using the general fonnula based upon the scatter 
because the arithmetic involved in the latter method is simpler. 
In logarithmic and reciprocal units, respective!}’-, the coefficient 
of correlation squared is, for these cuiwes of regression, a coeffi- 
cient of proportional variance just as is the r- for simple linear 
correlation problems. 

For the parabolic curve of regression, the deviations from the 
cuiwe of regression may be described as 

Xi - X[ = d and X[ = - d 

If these are added for the entire data, the result is 



3% ^ylUDYUb UIVAHIllhS AND MULllVAltlAllti 
and since 2d = 0 it foUowj that 

SX; = 2X, = NXi 

and hence the mean of Xj equals the mean of Xi 

Consequently, the sum of squares of Xi maj be obtained as 
follows 

Nc%,^ZX'\~NX\ (25) 

In Eq (25), iXj* may be evaluated as follows 

2Xi* = 2{Xi - d)* = ZXl - 22Xid + 2d’ 

As shown above on page 391, 2Xid »= 2d* Therefore, 

2X;* « 2XJ - Id* 

and 

, « IX? - 2d* - XX! (26) 

However, it is true by definition that 

2A'! - NX\ = Nc\ and 2d* = Xv! * 
Therefore, Eq (20) reduces to the following 

X<r* , = Xfff - X<r! 3 (27) 

From Eq (27), by dividing bv N and (hen by a* and transposing 
terms, it follows that 

(28) 

and from Eq (28) it follows by definition of II 2 that 


n, = ( 29 ) 

Hence the square of the correlation index has the same significance 
as the square of the linear coeflScient of correlation; it measuics 
the proportion of the total variance accounted for by the assumed 
type of eurvibnear eorrdation 



CHAPTER XVI 

MULTIPLE AND PARTIAL CORRELATION 


To deal with the relationship between only two variables 
the method of correlation so far discussed is useful, but in the 
nonexperimental sciences it is frequently and indeed usually 
more important to be able to deal with the association between 
three or more variables. In the social sciences in particular, 
variations in practically eveiy factor are related to variations 
in several other rather than in a single other factor. For exam- 
ple, variations in the price of cotton are related not only to 
changes in the production and consumption of cotton but also 
to changes in the prices of substitutes for cotton such as rayon 
and, in addition, to changes in the value of money. Again, the 
consumption of a commodity such as gasoline may depend more 
upon the number of automobiles in existence and upon the 
number of miles of hard-surfaced roads available for use than 
upon the price of gasoline. As a matter of fact, it is dependent 
on all these factors and others too. In such cases it is essential 
to have some method of “multiple correlation” and “partial 
correlation.” 

Definitions of Terms. Multiple Correlation. Multiple corre- 
lation is an extension to more than two variables of the methods 
of simple correlation. Simple linear correlation provides a line 
of regression from which an average value for the depend- 
ent variable may be estimated if the value of the independ- 
ent variable is given. Midtiple linear correlation provides a 
“plane” of regression by means of which an average value for 
the dependent variable may be estimated if the values of 
two or more independent variables are given. The plane of 
regi-ession of the price of cotton on the price of rayon and on the 
wholesale price level, for example, would permit the estimation 
of the former from joint knowledge of the latter, instead of from 
the price of raj’-on alone. Similarly, the plane of regression 
of the second-semester English grade on the first-semester English 

397 



39S STUDi OFBIViRUTEb IND MULTIV iUIAThS 

grade and on the verbal schola&tic aptitude test grade would 
permit the estimation of the former from jomt knowledge of the 
latter, instead of from only the fimt-semester English grade 
The regression equation accordmgl>, has two or more terms 
to the right instead of one, its general form is as follows 

X'l = fli 2j "h hi* 3 Ys + bii j X* + 

where Ai is the dependent variable, Xt, Aj, etc , arc the inde- 
pendent variables, and a and the b’s are estimated parameters, or 
regression statistics, whose numerical values aie determined in 
any particular case by the method of least squares The numer 
ical subscripts will be explained later For the moment it onlj 
need be noted that a plane of regression is but the evtcnsion to 
more than two variables of the idea of a line of regression 

In simple linear correlation, dispersion about the line of icgrca 
Sion of Ai on Xi serves os a measure of the accuracy of any 
estimate of Xi made from the Ime of legression In multiple 
correlation dispersion about the plane of regression serves as % 
measure of the accuracy of any estimate of tiie dependent variable 
made by reference to the plane of regression One of the essential 
problems of multiple correlation is to calculate dispeisien about 
the plane of regression 

In simple correlation, a line of regression is merely a law of 
relationship between one vanable taken as a dependent variable 
and another taken as an mdependent vanable, it does not of 
Itself describe the degree of relationship or association tliat exists 
To measure the degiee of linear association is the function of the 
coefficient of correlation Since the coefficient of correlation 
measures the amount of linear association, it also serves as a 
measure of the goodnc:>s of fit of the linear-regression equation 
to the bivariate distnbution and yields a measure of the general 
degree of accuracy of estimates made by reference to the regres- 
sion equation In multiple conclation, the coefficient of multiple 
torielation serves the same general function First, it serves 
as a measuie of the degree of association between one \ariable 
taken as the dependent vaiiabic and a group of other variables 
taken as the mdependent variable^ Hence, it aUo selves as i 
measure of the goodness of fit of tlie calculated piano of legression 
and eonseqiientlv as a meosuie of the general degiee of accuracj 



MULTIPLE AND PARTIAL CORRELATION 399 

of estimates made by reference to the equation for the plane of 
regression. 

In simple linear correlation, relationships are completely 
described by two lines of regression, one in which Xi is taken 
' as the dependent variable and the other in which Z2 is the 
dependent variable. In multiple correlation involving three 
variables, there are three planes of regression. If four variables 
are involved, there are four planes of regression, and so forth. 
In general, there are as many planes of regi'ession as there are 
variables that may be taken as dependent variables, in short, as 
many planes of regression as variables. In particular cases, the 
intuitive sense of cause and effect may lead to the rejection of 
some of these possible planes of regression as being without any 
practical significance. They must always, however, be consid- 
ered as theoretical possibilities. 

Where only two variables are considered, the coefficient of 
correlation between X3, taken as dependent, and Xi, taken as 
independent, is the same as the coefiSioient of correlation between 
Xi, taken as dependent, and X2, taken as independent. The 
measure of goodness of fit of the line of regression of Xi on Xi 
is the same as the measure of the goodness of fit of the line of 
regression of Xi on X^. This cannot be said of the various 
multiple-correlation coefficients. The multiple-correlation coeffi- 
cient that measui'es the degree of association between Xx, 
dependent, and Xi and X3, independent, as a group and that also 
serves as a measure of the goodness of fit of the plane of regression 
of Xi on X2 and Xz is not the same as the coefficient of multiple 
correlation that measui’es the degree of association of X2, depend- 
ent, with Xi and X 3 , independent, taken as a group and that 
also measures the goodness of fit of the plane of regression of X2 
on Xi and X 3 . Furthermore, neither of these two coefficients is 
equal, except by mere chance, to the coefficient of multiple 
correlation that measures the degree of association of X 3 , depend- 
ent, with Xi and X2, independent, taken together and that also 
measui-es the goodness of fit of the plane of regression of X3 on Xi 
and X2. In multiple correlation, there are as many different coef- 
ficients of multiple correlation as there are planes of regression. 

Linear vs. Nonlinear Relationships. The simplest form of 
correlation analysis rests on the assumption that the association 
between the variables is of a linear type. In some cases, this 



400 STUDY Ot mVARIAlhS IND MUL2IV UilATliS 

assumption does violenco to the facts, the association being 
clearlj of a nonlinear form Where a simple form of nonlinear 
relationship exists between two variables, it has been found 
possible to fit a curve of regression instead of a line of regression 
and to calculate a correlataon coefficient that measures the good 
ness of fit of tins curve Whether such a simple curve can be 
fitted or not, it is possible to calculate- a measure of nonlinear 
iclationship, called the “correlation ratio,” that depends on a 
comparison of the lanation about the means of the columns 
(or rows) of the grouped data with the total variation in the 
data ' 

Such deuces as these can also be used when nonlinear rela 
tionships exist among three oi more variables When the 
nonlinear relationship takes a simple form, it is possible to fit 
a curved plane or a surface of regression A multiple-correlation 
mdex I\ S3 can also be calculated to serve as a measure of the 
goodness of fit of this surface of regression Whether a simple 
form of a curved surface can be fitted or not, it is alHa>s possible 
to calculate a multiple^orrelation ratio of the same sort as the 
correlation ratio for only two variables Similar nonlinear 
lelationships can also be earned over into the analysis of partial 
correlation 

Pailtal Correlation Partial conclation is concerned with a 
concept resulting from the fact that more than tivo vanablcs 
are correlated if only two variables are consideied, there is no 
place for partial correlation Where there are three or more 
variables, however, the question of the interrelationships between 
the variables becomes a part of the analysis How much of the 
apparent association between two variables (Xi and is due 
to their common association with a third variable (A'j) and how 
much to their direct connection or to some connection through 
other vanables independent of Xz'i Would Ai and Yj continue 
to vary together if Xz were held constant? Ihis is the new 
problem that partial correlation attempts to solve Fortunatelj, 
the methods employed in its solution aic the same fundamentalh 
as those involved in simple linear correlation 

This chapter is pnmanly concerned with linear multiple and 
partial correlation involving three vanables The notation 
involved m multiple and partial correlation will first be sum 

* Chap W 



MULTIPLE AND PARTIAL CORRELATION 401 

marized. A brief discussion of a multivariate frequency dis- 
tribution, upon the basis of which any form of multiple or partial 
analysis must be based, will follow. Ensuing sections of the 
chapter \\’ill explain the fitting of planes of regression and will 
derive formulas for finding the numerical values of the regression 
statistics of any given plane fitted by the method of least squares. 
Formulas for measuring dispersion and for calculating multiple- 
correlation coefficients will also be derived. Partial correlation 
will be explained in more detail, and methods of calculating 
partial-correlation coefficients will be indicated. In the next 
chapter the entire subject will be illustrated by an example. 

Notation. It is the practice in multiple- and partial-correla- 
tion analysis to let a siTObol indicate the class to which a given 
quantity belongs and to denote by subscripts the particular 
number of the designated class. For example, if X stands for 
any variable measured in original units, Xi indicates a particular 
member of this group and its subscript distinguishes it from 
Xn, Xz, etc., which are members of other groups. In a designated 
problem, Xi may be the price of cotton, Xz the price of rayon, 
and Xz the general price level. Following is a summary of the 
various .symbols used in the .subsequent analysis, in which special 
attention should be directed to the subscripts : 

Variables measured in original units 
The estimated value of these variables given by the 
three regression eejuations in which the variables 
are taken as dependent. The primes distinguish 
them from the actual values of Xi, Xz, and Xz 
Variables measured from their means as origins 
(xi = Xi ~ Xi, etc.) 

The estimated values of Xi, Xz, and Xz given by their 
regi’ession equations and measured from the 
means of Xi, Xz, Xz {x[ = X[ — Xi, etc.) 

Means of Xi, Xz, Xz -■ 

Standard deviations of Xi, Xz, Xz 
Variables measured from their means as origins 
and expressed in terms of standard-deviation 
units 

x[, xi, x'z expressed in terms of the standard-devi- 
ation units of X\, Xz, Xz 


Xi, Xz, Xz 
Y' Y' Y' 


Xi, Xz, Xz 


f / / 

Xj, Xz, Xz 


Xz, Xz, Xz 

O'!, 0 ’ 2 > <^Z 
Xz Xz Xz 

(Tz 0'2 <^3 


.r( x'z Xz 
(Ti dz 0’3 


I 



102 bTUD\ Oh UlViHlAlLb IV1> MULTIV \Kl iTES 


= oi 1* + 6i* »As + fell •\i 

=* a* 11 + fell tXi + fell iA« 

\ J *= flu* + fell i\i + fell i\i 

1 ht'O arc the wiuattons for tlie three planes of 
rcgicssion in i^Iiich the \ anaWes are measured 
m terms of original unit*' The a’s ind fe’s arc 
the regression statistics of the equations, of 
which the explanation follows 

(ii ihe constant term in the regression Lquation in 

which Ai IS taken as the dependent xanahlo 
and Ai and Ai as the independent Nanables 
a* u and am Thes/* arc iht constant terms, when \i and 
Ai, respectixclj, arc the dependent v ambles 
The subscript before the point refers to tlic 
depcndcnt-varnblc number, the sub'cnpls 
after the point refer to the ludepcndtnt >an 
ablcs The order of buUcnpts after the pomt 
IS immaterial, that is, oiu ^ otti 
fell 1 i he coefficient of \% m the icgrcssion equation in 

which Ai IS taken as the dependent Nariable 
and At IS the other independent variable Tlic 
first number in the subscript indicates the 
dependent x unable, the second number in 
the subscript indicates the x unable of which the 
fe is a coefficient, the pomt followed h) the 
other bubsciipt indicates that a thud \unihlc 
w considered Similarly, feu* is thecocflicient 
of A 1 m the same regression equation It is to 
be. noted that feu t but 

feji a 1 he coefficient of A i in the regression ccpiation in 

which \j is taken as the dependent xanablc 
and A* the other independent >anab)c, 
fell V is the coefficient of A'l m the bumc regres- 
sion equation 

fe»* I and 6*1 • Ihcsc hisc a simitai meanuig for the third 
iigic&^ion equation 

Xi “ fell *xi -f feusij 
Jj ** bit »Ji -f- fe j.iXi 
“ fell iXl + fell iXi 





MULTIPLE AND PARTIAL CORRELATION 


403 


Equations (2) are another form of the three regres- 
sion equations. Here the variables are ex- 
pressed in terms of deviations from their 
respective means. In these equations there 
are no a’s, or constant terms, because the planes 
of regression all pass through the point given by 
the means of the three variables. The h’a are 
the same as tho.se in Eqs. (1) 


•‘^1 _ o 4 _ /? '^'3 

— P12.3 r Pl3.2 — 

O' 1 Ci ffz 


-==/32I.3-‘ + P23.1^ 

0'2 O'! 03 

^ = P3.2^^ + P32.3^^ 

C 3 (Ti 2 


(3) 


1 


bl2.3 

1)21.3 


P 12.3 

|321.3 


02 

O 2 

0-1 


1>31.2 

I)l3.2 

1)23.1 

1)32.1 


— P 31.2 
= /3 i 3.2 


03 

0-1 

£l 

0-3 




Equations (3) give a third form in which the three 
regression equations may be written. Here 
the variables represent deviations from their 
respective means expressed in standard-devi- 
ation units [the x'’s arc expressed in terms of the 
standard deviations of the a;’s (oi, 02, 03) 
instead of the standard deviations of the a:”s 

themselves]. The form is similar to =5 J’ — 

Ol 02 

for two variables^ 

In Eqs. (3), the P’s cori-espond to the h’s in the 
Eqs. (1) and (2). As may be seen by compar- 
ing Eqs. (2) and (3), the P’s are related to the b’a 
in the follonung way; 


( 4 ) 



* See pp. 349-351. 



lOt ‘iPUtn ot nivutiiits t\i} i/i////{ tiauis 

If (ho 8>mmctry of tht'<o cciuations is noted, 
thc> »ro rcnicmbcrul, for txampk, tho 

bub'scnpt*! of tho 6’s arc the x'lme a-n tho $ul>- 
scnpls of thc^ri, and t)io onJrr of the hnl (ho 
M ili'Cnpl numberi do-enbes the sub^cnpt for 
MRina m iiumentor aiui dcnointtufor, rp«pec 
tncK It L<itoI>c noted th it the ^t^idoci not 
Kpml 0 I a <te 

ffijj Iho stiltir about the pUiu of of \i on Yj 

'unl A* 

<rj u Ihc >-c liter almut Iht pitiu of riKrcNsiuii of Aa on \, 
and A a 

<ra 13 The *>0 liter almut (li< |>I im of re(;rx<s>ion of \| on \| 
and \a 

Uiu Ihc muU>])lcH.orr(Utiun eiK.lhriLnt Ixituiin \| on the 
otu h Uid and \a and \a on the utlicr ' 

Hie in(dti(>]<''eorrt.la(i<m crx-iHocnt l>etnrcn \a or the 
one b uid and \ i and Ya on tin. other 
/fail Ihc muItiptoHurrchitioii coefluirnt \a on tht 

one liond and A i and At on tlit other 
riij riit partnl-comlation cwITiciciit latwcm \i and \a 
nhen Aa h held constint Ihe [KX'dion of the ^ub- 
senpts ts inoa im|K>rtant than ihi noncipitalizntion of 
tho r in (ii'^tinguLdiini; it fruni the multipliscorrclatmii 
coeflicKUta The hub'-enpt after the ])0int mdicatoi 
ivhich aanable is held constant r.) a =* Cit a 
3 Iho p irti d-conclation cocllicictit bcl«t‘en \i anti \a 
11 hen \ a IS hchl const mt 

rai The parttal-corrciation coelHeiciit belMetn \s anti \a 
when \ I IS held const mt 

btudj of tho sjminctrj m the aboio ajstcm of notation will 
make it casj to remember W ith the cvccption of the notation 
for jMrti il-correlation cocfiicicuts, the order of subscnpts before 
tho jKimt IS alwn>s significant, following tho point it is ifwais 
immaten il 

MULTIVARIATE FREQUENCY DISTRIBUTION 
Ihc monoianato frequency distribution, it will be retailed 
IS the b usis for the determination of various measures dcscnbing 
tho ccntril tendency and variation about the central tendeac) 



MULTll^LE AND PARTIAL CORRELATION 


405 


of a single variable. The bivariate frequency distribution 
'(Chap. XIII) is the basis for the calculation of the line.s of 
regi'ession and the simple correlation r as well as for the calcu- 
lation of correlation ratios. In fact, the bivariate frequenc}' 
distribution contained all the information regarding the joint 
variation or covariance of Xi and Xj and hence formed the basis 
for the calculation of any measure or law of relationship between 
these two variables, linear or otherwise. Similarly, the multi- 
variate frequency distribution contains all the information about 



Fig. 117. — A trivariate frequency distribution. 


the covariance of Xi, X 2 , X,, etc., and it thus forms the basis for 
the calculation of any mea.sure or law of relationship between the 
different variables, individually or in groups. 

Figure 117 shows a trivariate frequency distribution in which 
each variable is gi’ouped into three class inteiwaLs. A sniall 
number of class intervals is taken in order to simplify the dia- 
gram; in any actual problem, the number of class intervals would, 

of course, be larger. . 

The figure shows the frequency (the number written on the 
floor of each cubical cell) ivith which Xi falls nnthin a given cla.-=s 
inteiwal at the same time that Xs falls within anothei given c ah.. 



lOG i>rUDY or BIVARIATP& AND ^fULTIVARlA7 1- 


interval and Xa falls within a third given class interval Accord 
mgly, m Fig 117, the frequency with which Xi takes on values 
between 100 and 200 at the same time that Xt takes on values 
between 1 and 2 and Xa takes on \alu^ between 5 and 10 is 10 
This 13 the frequency of the joint occurrence of the specified 
Xi, X 2 , and Xa values The frequencies m other cells repre- 
sent the frequency of joint occurrence of other Xi, Xs, and Aa 
combinations 

If the frequencies of this tnvanate frequency distribution are 
projected upon any one of the three reference planes, that is, if 
the frequencies aie added from top to bottom from left to nglit, 
or from front to rear, a bivanate frequency distnbution is 
obtained for two of the three variables For example, if the 
frequencies aie projected upon the YtXi plane, the bivamta 
frequency distribution for these two variables shown m Table 43 
IS obtained In Table 43 the frequencies of the tnvanate fre- 
Table 43 

0 I 2 3 Xt 


10 

lo 

A, 

quency distribution m Fig 117 are added from top to bottom 
If the frequencies are projected upon the XiXa plane, the 
bivanate frequency distnbution of Xi and Xt shown in Table 44 
Tabib 44 
X, 

300 

200 

100 


IS found To obtain the frequencies in Table 44, the frequencies 
m the tnvanate frequency distnbution (Fig 117), are added 
from front to rear 



Table 43 


0 12 3 

1 I 1 I 






MULTIPLE AND PARTIAL CORRELATION 407 

Finally, if the frequencies are projected onto the XiXz plane, 
the bivariate frequency distribution of Xi and Xz sho^vn in 
Table 45 is obtained. The frequencies shown in Table 45 are 
the sum from left to right of the frequencies in the trivariate 
frequency distribution sho^vn in Fig. 117 . 

Table 45 

X. 

300 


200 


100 
- 0 

X, 15 10 5 0 

In the three bivariate frequency distributions shown in Tables 
43 to 45 , it is to be noted that Xi and Xz are positively corre- 
lated, as are also Xi and Xi and Xi and Xz. The given tri- 
variate frequency distribution (Fig. 117 ) is one in which aU the 
variables are positively correlated with each other. In this 
case, as the values of Xi and Xz both increase, the mean value 
of Xi also tends to increase; in other words, the plane of regres- 
sion of on Xi and Z3 would slope upward from the origin in 
both the Xi and the X3 direction. Because of the all-round pos- 
itive correlation between the variables, the other planes of regres- 
sion would also slope upward from the origin in both directions. 

The net regression between two variables in a multivariate 
distribution is measured by the h statistic, and it is possible to 
have a negative net regression 612.3 although the Fearsonian 
coefficient of con-elation ri2 is positive, and vice versa. If ri2 
is small compared with ri3 and r23, the latter being either both 
negative or both positive, the plane of regression of Zi on Xi 
and Xz may slope downward in the ^2 direction even if ri2. is 
positive. The statistic 612.3 is of the same sign as ru so long as 
Tii — ri3r23 is of the same sign as rn. If this condition is not 
fulfilled, that is, if ri2 - ri3r23 and 5:12 are of opposite sign, 612.3 
Avill be opposite in sign to ?:i2 and the plane will slope in the 
opposite direction from that indicated by the sign of ri2, which, 
"when multiplied by the ratio ci/eri, describes the slope of the 
line of regression in the bivariate distribution of Xi and Xi. * In 

* See pp. 349-351. - - 


11 

13 

5 

12 

18 

12 

6 

10 

6 



408 hTUDY Ot BlVARIAThS AND 'iIVLTlVARtATbi> 

the case where rn is positive but hu t is negative, the coefficient 
of partial correlation rja a la negative, agreeing with the sign of 
the b statistic For this reason, the partial-correlation coefficient 
may be said to measure the net correlation between the two 
\ anables 

If the net correlations between Xi and Xi and between Xi 
and Xt are both negative, the plane of regression of Xi on Xj 
and Xi slopes downward m both directions In this instance, 
the bis 3 and the bu a of the regression equation are both negative 
In other words, the mean value of Xi would tend to decrease 
with increases m the values of both Xz and Xt This particular 
plane of regiession would have an all round negative slope If 
the net correlation between Xi and Xs is negative, however, and 
that between Xi and Xi is positive, the plane of regresaion of 
Xi on Xi and Xi slopes upward m the Xt direction, that is, the 
mean value of A'^i increases as A') increases, and the plane slopes 
downward in the A 3 direction, that is, the mean value of Xi 
declines as A| increases In this instance, bm is positive, 
and bij s is negative The plane of regression shows a positive 
relationship in one direction and a negative relationship m the 
other direction 

These are a few of the possible forms that a tnvanate fre- 
quency distribution might take Others include nonlinear 
relationships For example, the mean value of Ai might fiist 
increase as X 3 mcrcoses and also as Xt increases and then later 
decrease as both tliese vanables continued to increase, or A’l 
might dechne in the Xt direction after a certain point but con- 
tinue to nse in the A 3 direction For either of these combina 
tions, a curved plane or surface of regression would give a better 
ht than a straight plane 

In order that there be all round independence, that is to sa>, 
absolutely no correlation whatsoever, either linear or nonlinear, 
between any of the vanables, the following conditions must exist 

I The distnbution of Ai for any given A'j and A 3 class inter- 
vals, that is, the distnbution of Ai values for any given vertical 
shaft, must be of the same form, it must have the same mean, 
the same standard deviation, etc , even though it does not have 
the same number of cases, as the distribution of Ai values for 
every other veitical shaft of the tnvanate frequency distribution 
(see Fig 117) 



MULTIPLE AND PARTIAL CORRELATION 


409 


2, The distribution of Xz values for any given Xi and Xj 
class inteiTal, that is, the distribution of Xz values in anj- given 
horizontal shaft parallel to the Xyjaxis and perpendicular to the 
X 1 X 3 plane, must be of the same form as the distribution of Xz 
values in ever}' other horizontal shaft parallel to the Xr-axis 
(see Fig. 117). 

3. The distribution of X 3 values for any given Xi and Xz 
class intenml, that is, the distribution of Xj values in any given 
horizontal shaft parallel to the X^-axis and perpendicular to 
the X 1 X 2 plane, must be of the .same form as the distribution of 
Xs values in every other shaft parallel to the Xs-axis and per- 
pendicular to the XiX^a plane (see Fig. 117). 

A close study of a multivariate frequency distribution is 
therefore always desirable before attempting to calculate any 
measure of relationship. Since in some instances the net corre- 
lation may be of opposite sign from simple linear correlation, as 
illustrated above, the examination of separate bivariate dis- 
tributions for each pair of variables is not always a reliable 
method. It is better to undertake a study of the multivariate 
distribution. In a trivariate problem a diagram similar to 
Fig. 117 could be set up, but for a large number of class inter- 
vals it would be extremely difficult, if not impossible, to draw. 
The multivariate distribution can be studied, however, bj' select- 
ing all the Xi and X 2 variates associated with a given range of 
X 3 variates, for e.xample, in Fig. 117, all the Xi and X 2 variates 
associated with values of Xs from 5 to 10. In this manner, a 
series of frequency distributions of Xi for. varying values of A'i 
is obtained. 

T.vble 46. — V-vLUE-s OP A, .\nd Xz Assocl\.ted with V.ilues op 
Xz BBTWEEX 5 ASU 10 
0 I 2 3 X: 

200 
100 
.Y, 0 

Similar tables could be constructed showing the values of Xi 
and Xz associated with values of X 3 between 0 and 5 and with 
values of X 3 between 10 and 15. In this manner, the net corre- 




410 SlUD'i OF mVAUIAlBS AND MULTIVARlAThS 

lation between Xi and JCs can be studied, and, bj a feimilar 
procedure, tbe net correlations between and X 3 and between 
Xs and X 3 can be examined If such a studj should reveal that 
linear relationships prevail, the methods to be discussed in the 
ensuing sections could be apphed If simple curvilinear rela- 
tionships are apparent, some curved plane might better be fitted 
instead of a straight plane In some instances, the latter could 
be accomplished by using logarithms, reciprocals, or some other 
transformation of the variables, to which linear functions could 
be fitted , in other instances, it might be necessary to fit parabolic 
functions to the original units 

MULTIPLE LINEAR REGRESSION 

Ihe a’s, b’s, and fi’s of a linear plane of regression aro calcu 
lated m terms of given data or of quantities easily calculated 
from the data The 0's can be evaluated m terms of the feunple 
correlation coefiicients, the r’s, knowledge of these therefore 
permits the immediate calculation of the former The b's can 
be computed readily from the ^’s by raultipl>ing by the proper 
ratio of standard deviations Isee Eqs (4)] Finally, the a's can 
be computed from the h's and the means of the difierent vanables 

The common method of evaluating the fi's is the method of 
least squares It was pointed out that for three variables there 
are three planes of regression Values for the regres&ion sta- 
tistics in the regression equation of A'l on Xt aud Xi are dcn\ed 
by minimizmg the sum of the squares of the deviations of the 
actual values of Xi from those (Aj) given by the plane of regres- 
sion, that IS, by minimizing r(A’’i — AO* Similarly, values for 
the regression statistics in the regression equation of Y* on Aj 
and A 3 are denved by minimizing tlie sum of the square* of the 
deviations of Aj fiom those (Aj) given by the second plane of 
legression, that is, by mmimizing S(Ai — A'O* Finally, the 
values of the regression statistics m the regression equation of 
Ys on Xi and Xg aie denved hy minunizing S(Xg — AO* Ail 
three planes of regression are thus fitted by the method of least 
squares, but m each case the sum of the squares of a different 
set of deviations is minimized 

Using the third form of regression equation, the values of the 
statistics for the plane of regression of Ai on Xi and Aj are 
derived as follows 



MULTIPLE AND PARTIAL CORRELATION 


411 


In the equation for the plane of regression, 


— P12.3 r /3 i3.2 

0‘2 <^3 


the problem is to determine and such that 


is merelj'^ 


Since ^ ^ is merelj'^ 

i ^ [(Zi - Z,) ^ (Z 1 - Zx)]'^ = ^3 ^ (Zx - z;)^’ 
it follo\vs that S(Zx — Zj)- will be a minimum when 

Y 

\<rx irij 

is a minimum, and hence the plane of regression derived by 
minimizing the latter is the same as that derived by minimizing 
the former. 

If \ 3 ^ _ |3 j3 3 £2 j is to be a minimum, the deriva- 

3—/ \0-l <72 (Tzf 

tive of this sum with respect to /3x2.3 must equal zero and also its 
derivative with respect to /3x3.2 must equal zero. These condi- 
tions are expressed in the following equations: 

V (-^ - /3x2.3 - /3i3.2 = 0 ) 

/-/ <72 \0-l <72 <rz/ ( 

V - ^X2.3 - - /3i3.2 -) = 0 ) 

<73 \<ri <72 <73/ ' 

If in these equations the indicated multiplication is carried out 
and if each equation is divided by N, they become 


LXiXz 

N<7l<7n 


^^ 3^2 _ g 


^XiXz ™ 2X2^3 2^3 


Nciaz 





412 STUDY at UJVARIA1LS INO Ul/LUVAHIATLS 


But .t U.1I bo noted that - r„, ^ = r,,. = r„, 

~ — al and = c\ Hence Hqs (7) i educe to the following 


ru — /3i j — = 0 

*■!» ~ fiitaTti — 0ut = 0 


When solved bv the oidinai> method of simultaneous equa- 
tions, 


0iti = 


rn - rur » 

1 - ri. 


^ j , =5 ^i*^2a 

1 — r|i 


(9) 


From Eqs (9) it will be noted that, when rjj = 0, 0itt - r« 
find 0u » * ru * 

If the other phuies of regressjon arc put in Uie form 


' » — + 0it I ” 

ffi 

■ d.i . - + (J.. , - 


and the values of the d’» me determined m the same manner as 
the values of di- $ and ;8ij sweie detci mined, the following results 
aio obtained 


^31 t 

rn — rijrj. 


013 1 

1 - /•?, 
rs* — rurij 

1 m 

1 - r|. 


0311 

r,j -• ri,i-ts \ 


0311 

1 — rf, / 

- ( 

(0") 

1 “ rl, ) 



If the simple hnear-coirelation coefficients aie known, there- 
fore, it IS possible to obtain all the ^’s that enter into the three 
inultiple-iegicssion equations; and the legiession equations in the 
0 form aio thus detei mined The othei forms of the regression 


See pp 417 and 421 



MULTIPLE AND PARTIAL CORRELATION 


413 


equations can be derived from the j3 form by calculating the 
6’s from the /3’s, and the o’s from the 6’s and the means. For 

example, Eqs. (4), such as’ 612.3 = ^12.3—’ will give the values of 

(To 

the 6’s. The regression equations in the form 

Xi = 612.3^2 4 " bl 3 ,- 2 X 3 

are then determined. If, in this latter form, Xj — Xi is sub- 
.stituted for x'l, Xa — X2 for x^, and X3 — X3 for Xz, the equation 
becomes 

.^1 ~ .^1 612.3X2 — 613.2X3 4 - 612.3X2 4 " 6132X3 

from which it may be seen that the value of ai.23 is as follows: 

ai .23 = Xi — 612.3X2 — 613.2X3 (10) 

Similarly, the value of a for the other regression equations is 

found to be as follows: 

U2.13 == X2 — 621.3X1 — 623.1X3 (10^) 

03.12 = X3 — 631.2X1 — 632.1^^2 (10^^) 

It is helpful in the use of these equations to remember the 
symmetrj'- in the notation, that is, the symmetry in the position 
of the subscripts. 

Second-order Variances for Linear Plane of Regression. The 
formulas derived below measure the dispersion of the individual 
items about the plane of regression fitted b}’’ the method of 
least squares. As in the simpler case of the line of regression, so 
also for the plane of regression, the mathematical procedure con- 
sists in finding the standard deviation of the deviations of actual 
X values from the estimated values (X') given by the planes of 
regression. For example, by definition, aj.jj = 2(Xi — X[)-/N. 
The task reduces to one of evaluating .such expressions in terms 
of quantities already knowm, that is, the r’s and the /3’s. This 
can be done as follows: 

Since it has been found easier to work ivith the variables when 
they are converted into deviations from their respective means 
and expressed in terms of their standard deviations, the formula 
for (ri.23 will first be put in that form. This can be done by sub- 
tracting Xi from Xi and adding it to X[ and by multiplying both 



414 STl/Ur OF BIVAKIATES MVLTIVARIATFS 


numerator and denoinjnator bj af, neither of which w^ll affect 
the value of the expression Tbi^, 

2 - (X\ - x.)l^ 

■’*” N .iJV 


_ \gt gl/ 


( 11 ) 


The problem la to evaluate 2d* 

By Eqs (3), the third form of the regre'iSion equation of JVi on 
Xi and -Yj, it follou s that 



gl gl 


£i 

gl 


( 12 ) 


Accordingly, for any given set of values of Xi, xs, and x*, there 
corresponds a particular value for </, uhicli is the deviation of 
the actual value of Xi from the value of x[ obtained by putting 
the given values of x* and xj m the regression equation There 
are just as many d’s, therefore, as there are different sets of 
values of xi, Xj, and x» If any ono of these d’s is squared, 
£q (12) gives 

(13) 

gl g* g» 


If all values of d aie squared and summed, the following 
equation j-esults 

Irf. _ ^ (U) 

gj gl gj 

But from Eqs (6) and (12), it will be noted that 

<Ti at \gi gj gj/ 


and 


gJ gJ \gl gJ gJ/ 



MULTIPLE AND PARTIAL CORRELATION 


415 


Therefore, 


= 


ffi 



The evaluation, therefore, of 2_^ — will solve the problem. 
This can be done as follows: 

If each d, as shown in Eq. (12), is multiplied by the Xj/o-i to 
which that d belongs, the folIoAving result is obtained: 


Xid_x\ xixo XiXs 

2 ~ Pli.3 Pl3.2 


C 


(16) 


Values of for all values of d and Xi sum up as follows: 

O'! 


Xid 

2-1 0-1 


a; id _ 2.rf 


a ^XiXi 

Pvi.3 

<Tl 


~~ 013.3 


Lx a 3 

CIO’s 


(17) 


Hence, dividing by N, 


Ld- 

_ V 

' _ 

Lxi 

N 


' N<r\ 

Nffi 

But since 





2xf 

0 

2a: 1 X 2 


N 

— (Tj- 

iVo-jCs 


'LX1X3 
' N<Ti<Tz 


— ri2 


2a;ia:3 


= ri3 


it follows that 


2d* 

N 


= 1 “■ 012 . 3^12 ~ /lia.aris 


(18) 


and,-finally, from Eqs. (11) and (18), 

_ o’i2d* _ 


‘'' 1.23 ~ " ~ ~ 012.3ri2 — 0l3.2l’l3) 


(19) 


This gives an easy method for evaluating o’j ^a when the r’s 
and 0’s have been calculated. Similar formulas for evaluating 



41G bTUDY Oh lilViHlAlbb i\D WLTIS AHl il S') 


ffii, and the icattera about the other planes of legression, 
are found to be as follows * 


itI IS ” »*(! — ^*1 3^11 — ^2s irjs) (19 ) 

a\ ij = <rj(l — ^*1 ifi* — jrjs) (19 ') 

iVote the s^mmetr^ of these three equations 

COEFFICIENT OF MULTIPLE CORRELATION 


The multiple-correlation coefficient measures the correlation 
between the dependent variable and the two independent 
■variables taken together For reasons previously indicated,’ 

* HiQ dispersion « J j, luaj also be calculated from the formulas 

and =^*(1 -O 

This Tua} bo demonstrated as follows 1 rom Fi] (2d ) 



This gms 


(I -rf,)(l -t\,,) 


(I - r!,)(l - ^.) 


- + r*,f|, - rb - + 2 r»r»r» 

(I - rf,)(l - fj.) 


(1 - r|,) - (r% + rb - 2ri«r»r») 


- fit 


(1 - rU) 

(ru — r tfjQ _ (fit - r»rid 

(l-rl.) (l-l-;.) 


Equation (9), however, shows that the two fractions on the right arc equal 
respectively to ffit « and 0^ i 1/cnct 

(1 ~ rldCl — r?!.,) » 1 - r, di* I - Tij^u 
or on mahing use of Eq (19) 

«*!« =^?(1 -rr,,) 

In Chap \III (p 321) it was sbown that “ »*(! “ r*,) Hence the 
last equation mav be written 

<'!»=• <^.(l-rl„) 

Thus both ol the onginat lonniilas are derived Irom previously demon 
strated relationships Similar formulas hold for v\ ,i and a\ „ These arc 

and 

=- *1(1 ~ rJ.Xl - r,V,) 

• See pp 3el'3o3 and 36o-368 



MULTIPLE AND PARTIAL CORRELATION 


417 


^1.2 



may be taken as a good measure of multiple correlation. 

Tii.ii measures the degree of association between the Xi 
variable and Xn and Xz taken jointly. It can also be looked 
upon as a measure of the goodness of fit of the plane of regression 
of on Xi and Xz to the set of Xi values. For if the fit is 
perfect, be zero and hence Ri.zz will equal 1. Simi- 

larly, Ri.iz measures the degree of as.sociation between the Xz 
variable and Xi and Xz taken jointlj', and Rz.ii measures the 
association between the Xz variable and Xi and Xz taken jointlj". 
They also can be looked upon as measures of goodness of fit of 
their respective planas of regression. It will be recalled that all 
three of these multiple-correlation coefficients may have dif- 
ferent values. 

The multiple coefficient of correlation Ruzz is alwaj"s larger 
than or at least equal to and r^; for it stands to reason that Zi 
can be estimated better (or at least no more poorly) from two 
variables Xz and Xz than from Xz alone or Xz alone. Similarlj", 
Rz.n is greater than, or at least equal to, Viz and rzz', and Rz.ii is 
greater than, or at least equal to, and rzz. Furthermore, 
^ 1.23 is equal to the sum of rjo and rj, if X 2 and Xz are independent 
of each other; for, by Eq. (19), it follows that 


_ j _ ^.23 _ j *^1(1 PlZ.zTlZ — ^IZ.2>'1Z) 

<r'i 

= Piz.zr 12 4 " Piz.zr 13 


( 20 ) 


If Xz and Xz arc independent of each other, rzz = 0; and, bj" 
Eqs. (9), j 3,2.3 = ri 2 and ^u.z = r, 3 . Accordingly, if rzz = 0, 


Similarlj", if 
and if ri 2 = 0, 


Rlzz 


2 i 2 

— ^12 4" *13 

= *-l2 4- 

2 I •» 

— *'13 I" *'23 


( 21 ) 

(210 

(21") 


Consequentlj", by adding to the regression equation a second 
variable that is independent of the first, the accuracy mth Avhich 
the dependent variable .can be estimated is increased by the 



418 STUDY Of- BIVUUUJ-S 1\J> ^[UL11^ iRIUFS 

amount of the correlation between that variable and the newlj 
added variable 

It should be noted that only in special instances can a definite 
sign be gi% en the multiple-correlation coefficient, although it is 
usuallv a&sutned to be icherentlj positt\c For, as was indicated 
above it maj happen that the plane of legiession to which a 
given multiple correlation coefficient pertains may slope upward 
m one direction and downward in another direction, indicating 
a positive relationship between the dependent variable and one 
independent variable and a negative relationship between the 
dependent variable and the other independent vanablo In 
such an instance, the correlation between the dependent variable 
and the two independent variables taken jointly, that is, the 
multiple correlation, cannot be said to be either positive or 
negative. For such a multiplo-conelation coefficient, no sign is 
attached It is only when the dependent variable is positivelj 
or negatively correlated with each and every one of the inde- 
pendent V ariables that the multiple-correlation coefficient can be 
given a positive or negative sign 

COEFFICIENT OF PARTIAL CORRELATION 

In the preceding sections of this chapter the discussion has 
centered on the problem of estimating the value of one variable 
from one or more other variables by means of a regression equa 
tion In connection with this problem, a coefficient measunng 
the degree of association betw cen the dejiendent v anable and the 
independent variables as a group was evaluated to show the 
accuracy with which such estimates can be made fhis 
coefficient is a measure of the goodness of fit of the plane of 
1 egression 

When there are mterrelationships among three or more 
vanables, another problem appears It often happens that an 
apparent relationship between two vanables is in leality the 
result of ibeir indiMdual coanectioa nith a thsrd vsnable- that 
commonly affects them both For example, it maj be that the 
correlation between the pnee of cotton and the pnee of rayon 
IS due largelj to the correlation of each of them with the index 
of wholesale pnees In other words, the concomitant move- 
ments in the pnees of cotton and rayon may be due, funda 
mentally not to aiij direct relationship between the&e two 



MULTIPLE AND PARTIAL CORRELATION 


419 


competing commodities, but primarily to their qommon tend- 
ency to rise and fall with the general price level; they may be 
joint effects of a common cause. Similarly, the- concomitant 
variations in first- and second-semester English grades of 
freshmen in a woman’s college may be basically accounted for by 
their respective relationships to the grades attained by the same 
freshmen in verbal scholastic-aptitude tests or to their school 
records. 

The statistical device for discovering how much correlation 
there is between one variable and another variable when a third 
variable or a number of other variables are “held constant” is 
the method of “partial correlation.” The correlation between 
the freshmen grades in second-semester English, Xi, and the 
freshmen grades in first-semester English, Xi, when the grades 
of the respective freshmen in verbal scholastic-aptitude tests, Xz, 
are held constant is the partial correlation between and Xz. 
Such a partial correlation coefficient mil show how much con- 
nection there is between grades in first- and second-semester 
English independent of their common connection mth grades 
in verbal scholastic-aptitude tests. The coefficient of partial 
correlation, indicated in this instance as r^.z, will measure the 
degree of this independent association.^ 

A variable is, of course, not held constant in any physical 
sense. It is not possible in any way ex -post facto to change the 
fact that a Mount Holyoke freshman, who had grades of 160 
in first-semester English and 160 in second-semester English, 
had also a grade of 437 in her verbal scholastic-aptitude test; 
nor is it possible to change the fact that another Mount Holyoke 
freshman, who had grades of 120 in first-semester English and 
160 in second-semester English, had also a grade of 384 in her 
verbal scholastic-aptitude test. The ideal of holding constant 

‘ The position of the point in the subscripts of r\z.z, rather than the fact 
that it is a smaller letter, distinguishes it from Ri.zz- In the latter, the 
point comes after the first digit, setting off the two independent variables 
Xz and Xj, jointly associated with the dependent variable Xi. In the 
coefficient of partial correlation, the point sets off the variable that is held 
constant coming immediately after the pair that are correlated, X, and 
Xi. Thus, in ru.z, Xz is held constant while X, and Xz are correlated; 
in ri2.345, Xz, Xz, and Xs are held constant .while Xi and X2 are correlated. 
The symbol Ri.zza, by the position of the point, indicates a multiple-cor- 
relation coefficient between Xj, dependent, and X2, Xz, Xz, and Xz taken 
jointly as the independent variables; 



420 STUDY OF BIVARIAlbS AVD MULTIVAHIATES 


one of the three variables is wholly a statistical idea It consists 
in eliminating from each of the two vanables between which 
the partial correlation is sought the effect of the third variable 
More specifically, the line of legression of Xx on Xj is found, and 
the deviations of the actual values of Xi from those given bj the 
line of regression -Yj=ai» + &uYj are determined These 
deviations from the line of regression represent the vanation in 
Xi that IS left over after the linear effect of X3 is eliminated 
Similarly, the line of regression of Xi on Xj is computed, and 
the deviations of the actual values of X» from those given by the 
line of regression Xj = oj » + hijX, are determined The'ie 
deviations from the line of regression represent the variation m 
X2 that IS left over after the linear effect of X* is ehmmated 
When these residual deviations in Xi and Xj are correlated, the 
result is the partial-correlation coefiScient between Xi and X* 
when X} is held constant, because the effect of Xt upon each of 
them has been eliminated 

To calculate a partial coefficient of correlation the extended 
calculations involved m computing two lines of regression and 
measuring the deviations of the actual values from them is not 
necessary The coefficient of partial correlation can be alge- 
braically evaluated m a formula that makes it ]>ossible to compute 
it from the coefficients of simple boear correlation, as follows 

The deviation of Xj from the bne of regression of Xj on \s 
may be written as — afi or xi — ru~x* where the x's are 
measured from their respective means Similarly, the deviation 
of Xi from the hue of regression of X* on Xt may be written as 
Xt — x't or Xt — Tit ~ Xt The standard deviations of these 

O’* 

deviations from the hnes of regression have already been deter- 
mined to be ffi s and vj j, respectively In accordance with the 
ordinary formula for a simple correlation coefficient, that is, 
r = ^XiXt/Xaiut, the partial correlation coefficient between Xi 
and X2, when Xs is held constant, is, by defimtion, 


- - x[){X2 - xQ 




JVffj iffa t 


( 22 ) 



MULTIPLE AND PARTIAL CORRELATION 


421 


If the numerator is expanded and the values for o-j.a and 0-2.3 
are substituted in the denominator, this becomes 

XiXi ^ ^ XiXz — ri3 ^ ^ 0:2X3 + ri 3 r 23 

Upon transferring the divisor N<Tia« from the denominator to 
each term of the numerator, this becomes 

^XjX- ' ^ ^XiXj ^ ^X2X 3 _ _ (ria’2 2X3 

3 = ‘°^ ~ <^1 Na-jTz ' 0-10-2 iVo-5 

"V^l *’13 V^l ^3 

in which ru, riz, r^z can be substituted for their respective equiva- 
lent values, making the formula appear as follows; 




T] 1 , = "b ^ 13>'23 

■\/l *'13 V^l ^'23 

which reduces to 


ri 2.3 


>'12 ■“ ri 3 r 23 

"v/l y'lz "s/l r23 


( 23 ) 


Similar formulas for the partial correlation between Xi and 
X3 when Xz is held constant and the i)artial correlation between 
Xz and A'3 when Ai is held constant are as follows:^ 


Xu.-i 


r 2 . 3 .i - 


J'l 3 I'lO'ii 

1\3 — ^j-3*’l3 

Vi - n 2 -3/1 - j 1 , 


(230 

( 23'0 


From Eq. ( 23 ) it can be seen that if Xi and Xz are both uncor- 
related with A’s, that is, if ri3 and Vzi nre zero, then ?-i2.3 = rn- 
Similarly, if riz and r23 are zero, ri3.2 = ru', and if ri2 and ?’i3 
are zero, rz 3 .i = r 23 - 


3 For the coelEcient of partial correlation, the order of the numbers in the 
subscripts either before or after the point is a matter of indifference; that' 
is, ri2.3 = r2i.3 and = r2i.436, etc. It 3 vill be remembered that this is 
not true with respect to the order of the numbers iu the subscripts before 
the decimal in the h’s and the p's; that is, 612.3 621.3, 5^ Pn.i- 



422 SlUDY OF BIVAUIAIES i\D MUL2IV \RIATbS 


Any one of the following formulas, which can be leadily 
derived algebraically [see Eq (9)J may be used m place of, or as 
a check on, Eq (23) 

(24) 

(25) 

(26) 
(27) 

Thus the partial correlation coefficient can be calculated directly 
from the 0 ’s or from the b’s and the dispersion formulas for the 
simple lines of regression, as well as from the simple r’s The 
equations for the other coefficients of partial correlation are 
symmetncal with Eqs (24) to (27) 

The partial coefficieoU of correlation illustrated above are 
called correlation coefficients of the “first order," while the sun 
pie coefficients ru, ru, etc , are called “zeio-order" correlation 
coefficients If there are more than three variables involved eo 
that a partial coefficient of correlation, ru m, for example, is 
found, it IS called a “second-order" coefficient of correlation, 
similarly, ru is a “third order ’ coefficient of correlation, etc 
This classification is helpful in distinguishmg different sets of 
correlation coefficients The same terminology may be con- 
\eniently earned o\er to the other statistics in a correlation 
problem Thus, ui is a zero order standard deviation, erj j is a 
first-order standard deviation, etc , hit g is a first-order regression 
statistic, hit 34 IS a second-order regression statistic, etc 

ANALYSIS OF VARIANCE IN MULTIPLE CORRELATION 
When a plane of regression, for example, Xi on Xt and Xz, 
18 fitted to a tnvanate frequencj distnbution by the method of 
least squares, variation m Xi may be viewed as made up of a 
part that is due to its linear association with X^, a second part 
that is due to its linear association with Xz, and a third part 
that IS due to association with factois independent of both Xt 
and X 3 Foi the least squaies Equation (6) show that 
2 i 2 (i = 0 and ixjd = 0, which means that neither X 2 nor Xz 
is linearly correlated with deviations from the plane of regres- 





MULTIPLE AND PARTIAL CORRELATION 


423 


sion. In the case of a normal trivariate frequency distribution, 
the independent variables are not correlated in any NSiy with 
the deviations from the plane of regression. Ouang to the lack 
of correlation, the variance in the dependent variable is equal to 
the variance of the values given by the plane of regression plus 
the variance of the deviations from the plane. This may be 
sho^vn as follows: 


.r.i = x[ - {xi - a;0 

and 

xl = (a;',)- - 2{x, - x[)xi + ix^ - x[y- (28) 

If all the deviations squared, like x], described in Eq. (28) are 
added, the follomng result is obtained: 

2x1 = + 2(xi - x[y - 22{xi - x\)x[ 

or, by substituting x'l = { fiu.z — + /S13.2 — ) o'! for the last 

\ (T^ <Xz/ 

^ Xl = ^ (a;i)- + 2) “ 2cri^i2.3 ^ {Xi - x'l) ^ 

— 2a-i/3i3.2 {Xi x'l) — 

■W O’.! 


But, as just stated, the deviations Xi — x'l are not linearly corre- 
lated with Xi and Xs, so that the cross-product terms are zero. 
Therefore, 

2x1 = 2(a;;)= + 2(xi - xiy 
If each term is divided by N, it is found that 

crj = -j- <rj .23 ( 29 ) 


In Eq. (29), xl,’ i^aay be further evaluated, as follows 

= (^?2.3 S ^ X^i% 

* Since x[ — ( ^1^.3 ~~ ^13.2 — ) *^1* P* 411 . 

‘ \ (Ts <^3/ 



421 iiium Ot mVUUAlh^ U/-» 


If the abo\ e is div ided bj iV and a\ , al, a\, and » ji are substUuU'd 
for equualent \al 11 c 3 , the expression becomcj 

<T, — * + /J*j • + 2rja^ij s^ii t) (30) 

Hy substituting tlu-^ \aluc for a\ iii Kq (29), it is found that 

a\ = ff j{^i, » + j + 2rji/3it s^Jii 0 + ff] » 

or 

!=«!• + «.>+ 2r,^„ (31) 

From the manner m which Eq (31) was derived, it is known 
that the terms on the right side each represent a percentage of 



Flo 118— lUubtration of co«f5cieats of dir««t delarmination n raninK of Zr*!! 
croKs-producC form and the rctiduaJ \ ariancc 

the total variance of Xi This may be interpreted m tJio follow- 
ing manner 

CoeJJictents of Direct DelermtnaUon The first term of tlio 
right side of Eq (31),/3jj, may be interpreted as the percentage 
of the total variation in Xt that is due to its direct association 
with Xi It has consequently been called the “coefficient of 
direct determination” of Xt hy A'j Similarly’, maj be 
interpreted as the percentage of the total variation m X\ that 
IS due to its direct association with Xt Figure 118 depicts the 
coefficients of direct determination by arrows pointing from A* 
directly to Yi and from Xt directly to A'l 

CoeJ/iaents of Xit Regression The beta unsquared, /3it», 
desenbes the change m Yi m standard-dcmfion units tliat 



MULTIPLE AND PARTIAL CORRELATION 


425 


accompanies a given change of Xj in standard-deviation units, 
when Xz is constant. Geometrically, ^n.z is the slope of the line 

of intersection of a plane perpendicular to the — axis Avith the 

plane of regression, 




^ 12.3 



Xz 

13 2 

0-3 


The statistic / 3 i 2.3 has been called the “coefficient of net regres- 
sion” of Xi on X 2 in standard-deviation units. The coefficient 
of net regres.sion in original units is hiz.z. 

Coefficient of Joint Determination. The term 2r2z0i2.zPiz.2 maj^ 
be taken as representing the percentage of the total variation in 
Xi that is due to the joint or combined effect of X 2 and Xz 
resulting from the correlation betAveen these tA\'o valuables. In 
Fig. 118 the influence of X 3 on Xi through its correlation Avith Xo 
is depicted by the line aa'; the influence of X 2 on Xi through its 
con’elation A\dth X 3 is depicted by the line bb'. Relationships 
along these lines indicate the significance of the r-^ cross-product 
term of Eq. (31). While variation in X 2 may directly affect 
Xi, it may also, through its correlation Avith Xz, bring about a 
change in Xz and hence cause further variation in Xi resulting 
from the connection between X 3 and Xi. Similarly, variation 
in X3 may affect Xi, not only directly, but also indirectly 
through the association of Xz and X 2 . The term 2r2zffi2.zffiz.2 
maj'' be taken to represent the combined indirect variation in 
Xi resulting from variations in X 2 and X 3 . 

Meaning of Residual Variance. The portion of variance in 
Xi due directly to X 2 is |3i, the portion due directly to Xz is 
0 lz. 2 ', the portion due to the joint influence of X 2 and X 3 is the 
2 r -/3 cross-product term; the remainder of the variance is due 
to other factors not linearly correlated with X 2 and X 3 . This 
is depicted in Fig. 118. The sum of all of these four terms is 
equal to the total Amriance, or, expressed as a proportion, to 1. 
The sum of the first three terms may be interpreted as the total 
portion of variance in Xi that is due to its association with 
X 2 and X 3 jointly; the final term (rLsAi represents the portion 
of the variance in Xi that is due to its association AAuth other 
factors not linearly correlated AA'ith X 2 and Xs- The sum of all 
these portions of the total variance of Xi is necessarily equal to 1 . 



426 i>TUDY OF BIV\RtiTES AWD MULTIVARIATLS 


The Coe^icieiU of Mxdttple Cairclation From the previous 
discussion regarding the interpretation of the simple correlation 
coefficient and the correlation ratio, it is to be expected that a 
similar mterpietation might be made of the multiple-correlation 
coefficient, that is, that represents the portion of the total 
variance m At which is due to its joint asaociation with A'* and 
\j Equation (31) shows that this is actually the case, for it 
will be recalled that 



1 herefore, it follows by Eqs (29), (30), and (31) that 

*1 = ^12 1 "J" A*.! + t = (32) 

Hus interpretation of similar to that previously made of 



vvheie o-. is the staiidaid deviation of the Imo of regression 
It has been noted tliat f2] a can be interpreted as the portion 
of the total vaiiancc of \'t that may be attnbuted to its joint 
association with Ys and Yj It bos also been shown that 
= dn jrjs + duiru |see Eq (20)1 Consequently, it is 
possible to new ^ujru as the poition of the vananco of .Y| 
that IS due to its total association, both direct and indirect, with 
\ i and also to v lew jru as the portion of the v anance of Xi 
that IS due to its total association, both direct and indirect, 
with \j Theiefore, these two products have been called "coef- 
ficients of total determination” of Xi by Xi and -Yj 

hen either of the products is negative, however,^ it is pref- 
ci able to re&olv c the expression into its equal, namelj , 

^its + Aj t “b 2rtj/Sij iffii t 

' The mtcrpretatioQ of the products as coeificieats^ of total determination 
runs into difficulties in any particular caso if either of them la negative 
Either or jiitjru but not both may bo negative, since their sum 

equals which, of course is not negative To say that a variable 
contributes a negative percentage to the total variance of Xi has no mean 
mg and conscquentlj when either term la negative the interpretation w 
jmaginarj 



MULTIPLE AED PAIITIAL CORRELATION 


427 


because 


^1.23 — Pli.zTn + ^I3.2?'l3 — P\i.3 + ;8i3.2 + 23012.3^13.2 

Expressed in. the 0^ and cross-product form, it is easier to under- 
stand why a negative value of /Su.ai’ia or of 0i3.2ri3 (but not of 
both) may occur; for whenever either 0x2.3r\2 or 0\3.2i‘i3 is nega- 
tive, it uill be found that the joint contribution of Z2 and X3 
to the variance of Xi, represented by 2r230ii.30x3.2, is also nega- 
tive. This follows because either 7'23 is negative or 0i2.3 and 
013.2 are of opposite signs. In such a case, the direct effect of 
X2, for example, on the variation of Xi would be opposite in 
sign to its indirect effect, that is, through its correlation udth 
X3; and the existence of this indirect link to Xi through X3 
would tend to diminish the total variation cairsed in Xi by 
changes in X2. This damperring effect of a negative value for 
the r -0 cross-product term is what explains the existence of a 
negative value for either 0 n.zryz or 0 i 3 . 2 ri 3 . The form where they 
are squared also show.? the difficulty, in trying to assign to the 
variables X2 and X3 an independent part in accounting for the 
variation in Xi. When there is a joint contribution by the two 
variables, it becomes misleading to attempt to break it up and 
assign part of it to one and part to the other, as the foregohrg 
interpretation ba.sed upon the form /3i2.3?'i2 + Pn.zriz appears to 
do. 

As noted above, the part of the variance, that is' due to 
correlation between Xi and X2 and X3 together is described as 
follows d 


— ^ 1 . 23^1 


The right side of this expression describes the variance of the 
plane of regr’ession, just as describes the variance of the 
line of regression. This part of the variance in the Xi variable is 
made up of two parts that may be analyzed as follows:^ 


o'i .23 — ~ ^13.2) 

o 2 ** **212 ..-2 2 

= Cj — ^13.2*^i 1l2^\ i” 

= <rl - r,Vi - - ’■i2)^?3.2 


— cr{ — 




1.2' 13.2 


I Cf. Eqs. (29) and (31). 
* See footnote, p. 416. 



-428 ‘ijum oi- Diviinuhi> iM> Mihin xniijhci 
Therefore, bj (rirU'i>o'>mg Umis>, 


«’■!. ■ + ■'! !. (33) 

It follows from Fqs (33) and (29) tliat 

< “ fW + «! Ai 

in n Inch * is tlic scatter about the line of regrtx-^ion of A’l on \ j 

It IS thus seen that the variance of the piano of rcgrchsion 
consists of tw o parts (1) the \ anance due to the total linear asso- 
ciation between Yiand \i, which is r * 2 ?^*, and (2) oftlieiomam 
mg larioncc about the hne of regression (aj.*)/ the portion that 
IS due to the influence of Yj not already included m rit, namelj, 
’’liW • The partial coefficient of correlation ri» » dcscnbea tho 
lelatiouslup between A* and A'* when A't is held constant, in 
other words, it is the net correlation between Xi and A* The 
square of this partial coefficient of correlation is therefore the 
coefficient of proportional variance in A'l due to net correlation 
with Xi If to variance due to total association with \i la 
added tho variance due to net correlation \nth A’*, the result is 
the variance due to total correlation with A'} and A% }oin(i> 
iho rcmiindcr of the vanance, as Eq (33) indicates, is duo to 
other causes not linearly correlated with .Y* and Xt 
Analysis of Variance and Causal Relationships ^Vhe^e other 
knowledge suggests that causal relationships run otily in one 
direction, tho preceding analysis takes on considerable sigmS 
cance In biological investigations, for example, where the 
effect of heredity is being studied, xt seems logical to assume 
that variations m parents cause v anations m offspnng and Uut 
the causal relationship docs not run the other v\a> Again, m 
certain economic problems, it is to be assumed that fluctuations 
in weather conditions bring about changes m economic con- 
ditions, but that the latter have no effect upon the former In 
one-directional setups of this kind^ the ff’n take on the full siyufi 
cance of the connotation ''coefficients of determination " The 
/3*'s measure the amount of varution in the dependent vanahic 
caused by fluctuations m each of the independent variables 
separately, and in conjunction wnth the other variables in the 
2r $ cross-product expression It is in this type of problem tliat 
correlation analysis attains its greatest significance 



MULTIPLE AED PARTIAL CORRELATION 


429 


Where there is no reason to believe that causal relationships 
■ire unilateral, the interpretation of the results of correlation 
analysis in terms of causal determination becomes unscientific. 
In most problems there is interaction between the variables 
rather than a strictlj’- one-directional association. It is still 
possible to estimate what is called the dependent variable from 
knowledge of so-called independent variables, but the latter 
must not be looked upon as determining the foimer. With 
reference to a regression equation used for purposes of estimation 
and prediction, the /3-’s and the 2r-j3 cro.ss-product terms are 
useful in showing how important certain factors are, separately 
and in conjunction with other factors, in making estimates or 
predictions of the dependent variable. They become coefficients 
of determination only in the sense of statistical determination or 
estimation and not in any sense of physical, biological, or eco- 
nomic causation. 

Of incidental importance to method and theory but of con- 
siderable importance to practice, Eq. (31) provides a cross check 
on the aiithmetical work for finding most of the statistics in the 
multiple-correlation problem. 


EXTENSION OF MULTIPLE- AND PARTIAL-CORRELATION 
ANALYSIS TO FOUR VARIABLES 

The foregoing analysis, which has pertained to only three 
variables, may be extended to cover a greater number of vari- 
ables. In this section its extension to a four-variable problem 
will be discussed. In the next section its extension to any desired 
number of variables will be considered. 

When there are more than two independent variables in a 
regression equation, the number of /3 coefficients to be determined 
is correspondingly increased. If there were four variables, 
for example, that is, thi-ee independent variables, the form 
of the regression equation would become 


Q ^2 I a ^3 , a 

-- = ^12.34 T P13.24 T P14.23 — ' 

ffl (Ti <rx 0-4 


(34) 


If this regression plane is fitted by the method of least squares, 
the P’s can be determined in tenns of the r’s in the manner 
described for the three-variable problem; but, there being three 
equations to be solved for three jS’s, the solutions are not so simple 



ISO SlUDy OF BIVARIIT£S IVD MULTlVARIAThi> 


as those given bj Eqs (8) and (9) Some special method of 
solution may be employed, such as the so*callcd “Doohttle 
method ” of substitution or some determinant method * 

The cluef advantage claimed for the Doolittle method is that 
it piovides a check at each step m the problem, but expenonce 
for several years ivith its use foi four-variable correlation prob- 
lems has demonstrated its complexity and sensiti\itj For a 
multivanate problem of more than four variables both the 
Doolittle nork sheet and the determinant methods become 
increasingly cumbersome 

A saving in computation is obtained by using formulas based 
upon the algebraic evaluation of correlation statistics This 
saving was demonstrated for the tnvanate case in the first part 
of this chapter where it was found that the jS’s could be e\ aluated 
in terms of the zero order r’s (see Cq (9)1 In addition, how ev er, 
it IS possible by symmetry to extend the algebraic results of 
some of the formulas to apply to a multivanate correlation of 
as many vanables as desired, m effect to evaluate algebraicallj 
the correlation statistics for the general regression equation 

A', = a, fl + 6»/ k X, + 6a , A'l + -f bm A* (35) 


First the correlation statistics wall be algebraically evaluated 
for the four tunable case, ivhich in done as follows 
The normal least-squares equations for fitting a plane of 
legression of Xi on Xi, X'l, and X^, when divided by N, are 
as follows * 


= 


SXtXt 

NaVX; 

Zxjxi 


+ 0tt 


+ 014! 


N(rs<fs 
+ 0u 14 + 014 *3 

d- 014 s. 


+ 0u« 


X<r| ‘ 

2x»Xi 

NakVk 


SXjXj 

Na-iffi 

SXjXj 

Xo-jCTi 


Sill; _ ^ Sli , „ Sljls 
XffKTj 
SXll3 _ 

NffKTa 

ZxiXj _ 

Nffvn 

These maj be written, by substituting for their equivalent 
values, ri 2 , <tI, Tjj, rij, <r*, ri4, ru, <rl, as follows 


Nffl 


‘In theoretical work the determinant method gives neat and easilj 
remembered solutions In practical work, liowever, many statisticians 
prefer the method of substitution 
*Seepp 411-412 



MULTIPLE AND PARTIAL CORRELATION 

Tli — ^ 12.34 + ?’23J8i 3.24 + >’24/3ll.23 
J’l3 = j'23^ 12.34 + i3 13.24 + 5'34 j8h.23 
^■14= J'2|/3i2.31 + J'34j8i3.24 + ^14.23 

If the first of Eqs. (36) is multiplied by r 23 and the result is 
subtracted' from the second, / 3 i 2.34 is eliminated and it is found 
that 



»’i3 — riaros = (1 — r53)/3ij.2, + {r^ — r23r2i)0u.23 


or 


^’i3 — ri2)'23 ^ , rn — r23r«i „ 

^2 ■" P13.24 i 4 5 P14.23 

1 — ro3 1 — r;., 

which, b}'- Eqs. (9), is equivalent to saying 

013.2 — 013.2i "h ^43.2/3 i4.23 (37) 

Similarly, if the first of Eqs. (36), multiplied by )’ 24 , is subtracted 
from the third, /3 12.34 is eliminated and it is found that 

0U.2 = /3 14.23 + /334.2/3i3.24 (38) 

Correlation Statistics in Terms of Lower-order Correlation Sta- 
tistics of the Same Kind. From Eqs. (37) and (38), solved simul- 
taneoasly, it is found that 


^14.23 


0U.2 — 03i.20l3.2 
1 — 03i.2043.2 


(39) 


All the other second-order 0’s may be evaluated in a similar 
algebraic manner, or written by symmetry; for example, bj’" 
symmetry with Eq. (39), the ^ 13.24 is as follows: 


013.24 


013.2 — 043.2014.2 
1 — 043.2034.2 


(40) 


Equations (39) and (40) are equivalent to the following expres- 
sions in terms of the b’s, because, if bu .2 — ' 1 > 34.2 — ’ and similar 

ffi 0-3 

values of 0’s in terms of corresponding b’s and standard-deviation 
ratios are substituted, the standard deviations cancel out.^ 

^ Since the order of digits in the subscript after the point is immaterial 
in both the 6 and 0 statistics, the following formulas may be used as checks. 



432 STUDV OF BIV IUIA2 LS i^D MULl IV UiLiTES 
, bjfs ” tji }6ij j 


6i*s* = - 


■ 6»« s 
— fc<» tbl4 8 


- 6«» s 


(39’) 

(40) 


Correlation Stalislics jn Terms of Louer-order r's and a’s It is 
possible to express the abo\e formuLis in terms of lower-order 
r’s and ff’s if this method of calculation is pieferred It was 
noted in Eq (26) that 


and, b\ stmmetr}, it follows tliat 


Vko, by Eq (27), 
Consequently, 


hu* - Tut — 

^4 » 

If these talues of the in terms of r’s and standard deviations 
are substituted in Eq (39) and the xalues of the 6's m terms of 
the r’s and the scatter are substituted m Eq (39'), the following 
results arc obtained 

(41) 
(41') 

Correlaiton St^Uistics in Terms of Correlation Slalislics of Same 
Order The preceding formulas may be tiansformed to still 

on the ones given in the text 

» = 


Pi* * ~* »4n » 

1 - 

t|4 » — feti l&H % 


(391 

ng) 



MULTIPLE AND PARTIAL CORRELATION 


433 


another form in which the correlation statistics are expressed in 
terms of other cor relation s tatistics of the same order. Thus, 
since a-i.zi = cti.z \/l — rji.s and <^ 2.34 = ir-.j ■%/! — 3 , it follows 

that 


<r 1.3 






Vi - 

<^2.34 

Vl - f2J.3 


If these values are substituted in Eqs. (41) and (41'), the follow- 
ing results are obtained: 


.3i2.3; 

1*12.3) 


ri2.3i 
r 12.34 


<r 1 0'2.34 
O' 1.3 1 
02.Zi 


(42) 

(42') 


Similar algebraic procedure wall show that all the other yS’s 
and b’s have formulas that are symmetrical -with the above. 
For e.xample, 


^ 13.24 — ** 13 . 2 . 


O 3 0 1.24 
<ri <r3.24 


I _ 0 1,24 

013.24 — f 13.24 ' — ~ 
1^3.24 

0 cr4 0-1.23 

Pl4.*i3 — 

Ol Oi.zz 

1 _ 0 1.23 

0 14.23 —**11.23 

0'4.23 


Partial Correlation in the Four-variable Case. If there are 
four variables Xi, X 2 , Xz, and X 4 , the partial-coirelation coeffi- 
cient between Xi and X 2 when Xz and Xi are held constant, 
that is, ri 2 . 34 , is defined as the simple correlation coefficient 
between the deviations of Xi from the plane of regression of Xi 
on Xz and X 4 and the deviations of X 2 from the plane of regi'es- 
sion of X 2 on X 3 and Xi. Accordingly, partial correlation, when 
four variables are involved, is no more complex algebraically 
than when three variables are involved; both are simple lineal- 
correlations between residuals. In the three-variable problem 
the partial coefficient of correlation is found by correlating the 


434 i>TUI>Y OF lilVAHlAThS AWO MULllVARIAFtli 


residuals about hues o£ regression, m the four~\anabk problem 
the partial coefficient of correlation is found bj correlating the 
residuals about planes of regression The formula for the 
second-order partial coefficient of correlation ru »* is obtained 
in the same algebraic manner as /or the first-order partwi 
coefficients of eorreJation [see Eqs (22) and (23)] The formula 

IS 


ru ji 


Tin — ru iVu 1 _ 
\/l - ^14 * VI ~ rj, j 


(13) 


This 13 a partial coefficient of coi relation of the second order 
It %viU be noted that the formula runs m terms of the correlation 
coefficients of the first order Jo make use of this formur-i in 
practice, therefore, it first becomes necessary to calculate th< 
zero-order correlation coefficients and then the firsUordcr coeffi- 
cients before the second-order coefficients can be detenrnned 
The calcul vtion of higher-order partial correlation coefficients w 
thus similar to tho calculation of those of lower order, but the> 
require additional work 

Third-order Varunce sad Multiple-correlation Coefficient 
riio formula for tlurd-order vanance m the four variable prob- 
lem 18 obtained by adding a term to the formula for scatter in 
the threc-vanablo problem, is follows 

*" j8is}«r,» — ufji $ii ufn) (41) 

The equation for the multiplo-corrclatioii coefficient is, b\ 
S 3 mraetrv as follows 

or 

= dll iirjj -b dll i«rjj + di4 urn 


SSTE^SlOIf OF MULTJPLB- AliO PARTIAL-COBEStATSOJi 
ANALYSIS TO ANY DESIRED NUMBER OF VARIABLES 

For a three-\ anablc and eaen for a four-\anablc correlation 
pioblem, it is probably desirable first to calculate the d's from 
the lower-order /3's In the three-variablo correlation problem, 
this means calculating the /S*s from the zero-order r%, which at 
that lei el correspond to the fl’s 



MULTIPLE AND PARTIAL CORRELATION 


435 


For more than four variables it may be better first to calculate 
the higher-order r’s from the lower-order r’s and thereafter 
obtain the h and /3 statistics. Following is the extension to the 
general multivariate problem of the various formulas, showing the 
series of formulas in some cases from the simple correlation to 
the multivariate in order to depict how thej'' are readily obtained 
by symmetry. 

Extension of Formulas to General Multivariate Problem. 
Statistics for the Regression Planes. The b and /3 statistics are 
evaluated, in general, by the following formula : 

(a) h ^ r- 

c 

or 

CM.-.n 

This formula can be used when the r’s and o-’s of the same order 
have already been obtained, but it does not provide for the cal- 
culation of the 6’s from lower-order statistics. 

In the simple linear-regression equation, carrying out the 

instructions of the above formula gives hu = ru—. If there 

(72 

(T i 3 

are three variables, it becomes 612.3 = >"12.3 — etc. 

0-2.3 

The relationship between the 6’s and the /3’s is not affected 
by the number of variables and may be summarized as follows: 


(b) 


7 »•» 

P12.3 — ^12.3 

(Tl 

,0-2 
P 12.34 V12.34 — 

CTl 

/3i3.2 = 6i3.2 ~ 
O’ 1 


/3i3.24 — 6i3.24 


o-i 


The a’s for the various planes of regression areiound by formulas 
such as the folloAving: 



430 Slum OF BlViRIAlbS A\'D \WLLI\ UIIATE6 


(c) Oi jn — “ bis “ feissi-yj ■" bu:3A.4 

Oj m =» Xi — bsit 4 Xi — bsj 14X3 ” bn 13X4 

< 1 ,,. . = 1', -6„, .\, -6.., .X 

Htgh-order Vanancfis The higher-order vanances may be 
estimated by using either the 0 formulas or the partial r formula.'' 

(d) ff* J34 = — 0u — ^1* S4ri» ~ ;Si4 sarn) 

» = -.r,i -ySmt r.J 

and 

cl , * ci(l - r,*0 
clu = cl(l - rj,)(l - r?, s) 

<rij 34 = crj(l "■ *■*^(1 ~ “■ ^ 14 #?) 


PH — *“ U ■" p) 

The Multtple^oTTclcUion Formulas As m the case of the 
relationship between the 6’s and the the formula for the 
multiple-correlation coefficient is of the same form regardless 
of the number of variables, it is alua>s a comparison of the 
1 esidual scatter about a plane of regression ivith the total standard 
deviation In the four-vanable and the general inultiple-varmble 
cases it is as follous 



ThePartial-corrdaltonl'ormulas The formsof theformulasfoi 
the partial coefficients of correlation are also independent of the 
number of vaiiables m the pioblem, because the partial correla- 
tion coefficient ahiajs peitains to a simple linear correlation 
The formulas are therefore symmetneal, as follows 
Second-Older partials 

ru » — rit 3rg4 s 

rii 34 = ■ ; = — 7,—= , = ■ 

Vl — *‘*4* Vl ^ 3 

And lu general, the formula for calculating paitials of a given 
order from the next lower order partials 




CHAPTER XVII 


ANALYSIS OF A MULTIVARIATE FREQUENCY 
DISTRIBUTION ILLUSTRATED 

o To illustrate the analysis of a multivariate frequency dis- 
tribution, data on grades of 81 freshmen in a woman’s college 
were selected. These are arranged in Table 47 in such a way 
as to facilitate the construction of simple linear-correlation tables 
and to facilitate detailed study of the multivariate distribution. 
The first part of this chapter will illustrate trivariate-frequency- 
distribution analysis. The X 4 variable is included in the table 
so that later in the chapter the analysis can be extended to four 
variables. Beyond four variables, the method proceeds in a 
symmeti'ical fashion. 

Examination of Multivariate Distribution. The first step in 
the analysis of a multivariate distribution is to determine how 
well the assumption of linearity of relationship is approximated. 
The scatter of cases in the correlation table for Xi and X 2 
appears to indicate simple linear regression between Xi and X^.* 
In Tables 48 and 49, correlation tables for Xi and X 3 and for 
X 2 and X 3 , respectively, the scatter of cases suggests that these 
regressions might be only slightly if at all nonlinear. As pointed 
out in the preceding chapter, however, the simple correlation 
charts cannot be expected to reveal how much net regression 
exists in a trivariate correlation problem; accordingljq further 
study of the trivariate distribution of Xi, Xa, and X3, should be 
undertaken. In order to do this the multivariate distribution 
itself, shown in Table 47, must be studied. 

For example, the net regre.ssion of X 2 on X 3 maj'^ be tested 
in a preliminary fashion by holding constant the Xi variable. 
Accordingly, analysis may be made of the Xa and X3 grades of 
the 16 students whose Xi grades are 200. ^able 50 shows the 
Xa and X3 grades of these 16 freshmen, selected from Table 47. 


See Table 34 (p. 358). 


437 



438 STUDY OF BIV XRIAlbS AND .MULriYAUlAThS 










ANALYSIS OF A MULTIVARIATE FREQUENCY 439 


Table 47 . — Grades of 81 ^Mouot Holyoke Freshmen in Verbal Scho- 
lastic-aptitude Test, in College Board English Examination, and in 
First- and Second-semester English. — (Continued) 


Student 

number 

Second-semester 
English grade 

Xi 

First-semester 
English grade 

Xi 

%'erbal 

S.A.T. 

X> 

College Board 
English 
examination 

Xi 

41 

260 

260 

646 

643 

42 

180 

160 

511 

532 

43 

100 

60 

339 

442 

44 

200 

220 

424 

594 

45 


200 

515 

594 

46 

160 

120 

460 

'525 

47 

180 

160 

418 

^ 504 

48 

280 

220 

581 

684 

49 


200 

479 

560 

50 

220 

220 

579 

532 

51 

220 

200 

600 

594 

52 

240 

220 

610 

497 

53 

100 

60 

376 

435 

54 


220 

458 

484 

55 


200 

600 

497 

56 

200 

220 

489 

539 

57 


220 

567 

525 

58 


200 

513 

636 

59 


200 

644 

581 

60 

180 

140 

521 ' 

401 

61 

160 

140 

345 

442 

62 

240 

220 

596 

636 

63 

260 

260 

667 

567 

64 

160 

120 

384 

546 

65 

260 

240 

646 . 

511 

66 

220 

180 

500 

504 

67 

220 

240 

613 

608 

68 

260 

260 

707 

. 615 

69 

240 

220 

703 

657 

70 

200 

200 

391 

497 

71 

140 

120 

416 

442 

72 

260 

240 

667 

657 

73 

200 

180 

477 

664 

74 

300 

280 

709 

518 

75 

180 

140 

527 

539 

76 

220 

180 

475 

629 

77 

180 

180 

636 

594 

78 


280 

543 

726 

79 

220 

220 

432 

546 

' 80 

220 

200 

380 

428 

81 


180 

479 

539 














ANALYSIS OF A MULTIVARIATE FREQUENCY 441 



369.00 


414.0 

426.86 

438.00 

492.75 

497.33 

522.40 

537.33 

553.09 

0 

0 

01 

so 

024.00 


X 

0 

0 

ii 

Z 

0 

01 

{[ 

/ 

/ 

X 

US 

Ii 

N 

/ 

0 

X 

» 

/ 

■ol 

O 

O 

o 

o 


Ol 

co 


o 

o 

r 

1 i 

-f w , 01 
, 01 ‘ 

0 

CO 


01 

Ol 

n 

0 

•-H 

ci 

o 

o 

CO 

o 

0 

01 

CO 

o 

” 

01 1 SI 
l'» j Cl 

1 

0 

OO 

J 

0 

aO 


M 

0 

■«1 2 

« 

o 

1 

Ol 

1 

o 

T 

CO 

' 

o 


ts> 

03 

so 

0 

01 

0 


t*. 

us 

■tt! 

o 

t 

>«* 

03 

1 

T 

T 

O 

i-i 

p”r 

aO 


xl 

n 

Ol 

o 


b- 


CO 

Si 

2 

CO 

- 


04 

13 / 

1/ X 

X 

g ' = ' - 

, W 01 

' 1*1" 

X 



n 






1 




1 


1 1 









i 1 1 






1 1 


\ 

CJ 

C5 

o 








t 1 

Ol Ol j 

Pi, 

'5*'~T 0-0 ^ 

on* 

"ri 

CO 


1 

X X ' X 

"xr 

w 

« 

X 

to 

01 

1 

a 

o 

o 








‘O-'j’ I 22 ' 

-l I-I 



01 

*0 ‘ 0 

0 ' 0 
to 01 

0 

0 

0 

X 

04 

1 

o 

w 

9 








t \ - !^ 

^ ^ ''I 01 :s 

I r' r‘ 1 


aO 

, [ 

-r 0 S5 1 0 

” r 

e 

0 

c 

X 

01 

-C09 





, 

! 

O'S 

■nnBDBi 

1 

1 

B 


1 

579- 





ri 

o,= 

-j"! 2'’',® a 
fl ,, I « 





1 

01 X , 0 

01 

!>• 

to 

X 

01 

1 

o 

U3 




{ 



S 

1 

1 

af3 


lO 1 0 

n 

0 

0 

01 

1 

S3 

O 




' 

OOiOO 

r' 1^ 

1 

B 

1 

0 ^ 

1 

B 

0 

0 

9 

= 

M 

X 

X 

0 

04 

489- 

1 

1 

1 



" 



1 

1 

1 

0 

B 

fl 

S 

i- 

' 

S 

•c* 

c 

01 

( 

o 

»o 

1 

1 

1 


1 

1 

■ 

1 

1 

1 

1 

1 

1 

CS 

B 

B 


01 

1 

01 

01 

01 

C5 

1 

Ci 

1 

1 

!oc.. 

i 


01 

oo 

c.' 

1' 

oci 

i i 





1 

1 

1 


I 

A 

a 

cc 




2? 

. 

00 CO 

'J 

01 Ol 




X 

fl 


B 

fl 

1 

A 

O 

CO 

OC) 
n 'o 






o‘c 

T' 

01 






to 

to 

1 

fl 

1 

339- 

1 


1 

1 

m 

1 

1 

1 

1 

1 

1 

1 

■ 

<n 

to 

X 

X 

0 

1 

I 



s 


A 

01 

A 

1 

o 

to 

1 

o 

CO 

1 

§ 

01 

1 

0 

01 
Ol 

0 

01 

A 

0 

01 

0 

CO 

01 


y\ 

^!s 



-Sis 
-el 5 ; 



first-scmcstH Englisli Rrnde -Vs “ vi'ilinl schol'i'itii'-nptitmlc test. 


























































H2 


S2(/D1 Ot ISn ilUilLii l\/> UilWLS 


r«lLK oO \j AND \j GiUUE. 1 or 16 &TUDl'>Tb ^\uc»i^ Yi CRADf Is 21)0 


'<Iu(!p t 


2 

10 


ISO m 

ISO -137 


IGO 

200 


IQo 

4-13 


26 

30 

31 
39 
44 


ICO 

200 

200 

200 

220 


53o 

o04 

431 

619 

424 


4 j 

40 

od 

70 

73 


200 15 

200 479 

220 I 1S9 

200 391 

ISO I 477 


81 180 470 


Iigurc 110 i>ho\)4 tiio'tcoturof tlicsc lo bn mates, it indicate 
that the icgioMon may bi linear although there appcirb to be 
little ttndcnc> for \iaud \, unalToctcU by Y, to be correlated 



1 10 119 — Net reg es*. on of \i on \i 1 oW g Y conslint Test based ou 
\ - 200 

With each other Evidentl> Moltuce 13 not done to the facts 
by asbiimmg that any correlation present is linear in character 
It must be remembered of couibe that this prehmmarj test of 




ANALYSIS OF A MULTIVARIATE FREQUENCY 


4-13 


net regression between X2 and X3, holding Xi constant, is based 
on a small sample of only 16 observations. Similar tests, holding 
Xi constant at several other values, respectively, should be 
made, especially if nonlinearity is suspected. 

Inasmuch as TaBle 34 (page 358) shows such a clear linear 
total regression between Xi and Xn, it may be assumed that the 
net regi’ession of Xi on X 2 , and vice versa, is linear. For further 
illustration of the method of examining the multivariate fre- 
quency distribution to test for linearity of regression. Table 51 
presents the joint variation in Xi and X 3 grades of those students 
whose X 2 grade is 220. This will make it possible to test the 
linearity of net regression between Xi and X3, holding X2 
constant. 

T.-vblk 51 . — Xi AND Xi Orades of 18 Students Whose Xz Gb.ude Is 220 


Student 

number 

-Vi 

X, 

1 

240 

573 

G 

240 

567 

11 

220 

443 

19 

240 

598 

20 

240 

536 

33 

260 

509 

36 

220 

531 

40 

220 

456 

44 

200 

424 

48 

280 

581 

50 

220 

579 

52 

240 

610 

54 

220 

458 

58 

200 

489 

57 

220 

567 

62 

240 

596 

69 

240 

703 

79 

220 

432 


The bivariate frequency distribution of the eighteen cases in 
Table 51 is plotted in Fig. 120, Avhich shows that then scatter, 
■\\dth the exception of one case, follows a linear path. It maj'' be 
concluded that the net regression beriveen Xi and X 3 approxi- 
mates the linear form. Here again, especially if nonlinearity is 
suspected, similar tests using several different A*alues of X 2 , 



41i 6TUD\ Of lilV llUiU-it IN£> MlLin IIIUTE', 


rt«(>cctnclj should be made Tables 50 a«d ol ■sene on!} to 
illusirale the method which %\ouId ordiniriK need to bo apj lu-J 
more complete!} than !>> done here 

•nw 220 240 200 -60 300 \, 

1 II "11 





(lu 1 0 Set rccroK o l>ct«icn \ and \< loll g \t c nita t Tot 
laecdn V. - '^0 

Statistics of the Tnvanate Frequency Distnbubon Stud} of 
the tmanite frttiutnc} distribution \) \* and \i bhomi m 
Table 17 appe u-a to mdicitc tint icgrcs&ions arc linear and 
therefore it ma} be assumed that the metliods of correlation 
futlmed 111 Chap WI ma} appropnitcl} be applied In ordtr 
to calculate the statistics for a tnianatc fitqucucN difclnbution 
it la ncccssar} to obtain Srst the statistics fur the \arious mon>* 
\ ariate distributions 

Calculation oj Zero-order Slalt^tcs Correlation tables IS 
and 49 raa\ lie used as \ork sheets as illustrated in Chap \I\ 
to calculate the zcio-oultr statistics Since the problem is no i 
one of anal}zing a Insanalc fiequency distnlnition it is well 
to Mjt up a schedule for calculation of the ic ptetne statistiia 
Table 52 is a schedule of the means uid \an inccs for the three 
monovanate frequcnc} distributions taken ‘-cparatel} In each 
ease the mean was calculitcil b} u itig the formula 




Ob' .1 MULTlVAUlATIi FUEQUENCY 445 


T.IBLE 52. AND Vahi.vncks 


Means 

Sums of squaiei* 

Variances 

1 

Standard 

deviatioiiH 

V, = 217.4 

AV^ = 156,335.56 

= 1,930.3155 

0-1 = 43.94 

X, = 204.1 

Nc2 = 181,155.56 ' 

= 2,236.4883 

0-2 = 47 . 29 

It 

Nal = 719,222.22 | .rj = 8,879.2866 

1 

0-3 = 94.23 


The sum of squares of deviations from the mean in each ca.se is 
calculated by using the formula 


Na- 




The required .sums are all found in the correlation table.s, for 
example, 


N<t\ = 400 


543 - 


( 111 )-' 
81 ■ 


400(543 


= 150,355.50 


152.11111) 


Table 53 is a .schedule for the calculation of zero-order Pear- 
sonian coefficients of correlation, using the equation that was 
used in Chap. XIV. This equation is showm at the head of Table 
53. The entries all come from Tables 48 and 49 and Table 34 
(page 358). 

'l\vBi.E 53. — Calculation of Simple r’s 



(•/) if) 

- N 

(?)(«) 



^(?)(r) 

1 



( 1 ) 

1 

( 2 ) 

( 3 ) 

(4) 

( 5 ) 



* © © 

1 

*'(i)© 

( 1 ) - ( 2 ) 

( 4 ) ( 3 ) 

I'm 

riz 

1 

r23 i 

4.53.00 

283.00 

322.00 

1 

78.11111 

-68.51852 

-35.18519 

420.74964 
558 . 90486 
601.59915 

376.88889 
351.51852 ; 
357.18519 j 

4-0.89576 

4-0.62894 

4-0.59373 


Calculation of First-order Statistics. As suggested in Chap. 
XVI the first-order statistics of a trivariate frequency distri- 
bution may be calculated by several methods. The most efficient 
method appears to be to calculate the first-order /3’s and from 






146 STUDY Ot BlVARIAThS iND MULIIVAUIATLS 


the first-order /S’s to calculate the other first-order statistics 
Table 54 is a i\ork sheet for the orderly calculation of the first- 


Table 54 — Calccuatios of the Piest-ordeii s from the 
Zero-ordeb t'b' 


(Sec Chap XVI, Eqa {9)1 


(1) 

(2) 

(3) 

(« 

(5) 

Zero-order r <fi) 

Ttoduct 
Urm of 

IVhole 

numerator 


First-order fi 

Subscript 

tlegiess on 
ststutie 

Subscript 

Regress on 

statistic 

12 

0 89370 

0 37342 

0 52234 

0 6474S 

12 3 

0 80673 

13 

0 62894 






S3 

0 39373 






13 

0 62894 

0 53134 

0 09710 

0 G4748 

13 2 

0 14997 

12 

0 89370 






23 

0 39373 






12 

0 89S76 

0 37342 

0 52234 

0 60444 

21 3 

0 86417 

23 

0 59373 






13 

0 62894 






23 

0 59373 

0 56338 

0 03035 

0 60444 

23 1 

0 OoOai 

12 

0 89570 






13 

0 62894 






13 

0 62894 

0 53184 

0 09710 

0 19762 

31 2 

0 4913o 

23 

0 59373 






12 

0 89576 






23 

0 59373 

0 56338 

0 03035 

0 19762 

32 1 

0 15358 

13 

0 62894 






12 

0 89576 







‘ Note the mtemal check* in cciuiDiw (2) (3) «od (t) id which each of three \»lue9 
occur* twice in column (2) the Cc*t sod thud eeeond and filth fourth and sixth figures 
eberJe in column (31 the sxme order* check in etdiuno (4) the first snd eeeond the third 
and fourth and the filth and sixth figure* cheeh. While not independent cheeks they 
ne\ ertheless eice confidence in the aceniaey of the work u it proceeds 

If preferred the b a instead of the ff s could first be calculated by using a similar table 
end the general formula 

, h — % sisr 
I — 


order ^'s m the illustrated tnvaimte frequency distribution 
The entries in column (I) of the table are obtained from Table 53 
Bearing in mind the symmetry in the formula shown at the bead 



ANALYSIS OF A MULTIVARIATE FREQUENCY 447 


of Table 54, the zero-order r’s, which are also the zero-order fi’s, 
are copied in the order in which they occur- in the formula. 
Consequently, the entries in column (2) are the products of the 
r’s in the second and third lines of each trio of r’s in column (1), 
The entry in column (3) is the first r of each trio minus the 
entry in column (2). The 1 — in column (4), which may be 
found by using a sine table, is for the third r in each trio of r’s 
in colvunn (1). Thus, if the trios of r’s are properly arranged 
in column (1), which can be done bj" following the general formula 
at the head of the table, the symmetry of the work sheet facili- 
tates all necessary calculations. In using this work sheet, the 
first step is to write in column (5) the subscript for the first- 
order that is to be calculated; this subscript then determines 
the order of the zero-order r’s in column (1). The value of the 
first-order /3, entered in column (5), is found by dividing the entry 
in column (3) by the corresponding entry in column (4). 

The coefficients of partial correlation are readily calculated 
from the /3’s, as follows d 

r,,.;. = fiij.kPii.I, 

I'h.Z ~ ^12.3021.3 

= 0.80673(0.86417) = 0.69715 
ri2.3 “ 0.83496 

^’l3.2 ~ ^ 13 . 2 ^ 31.2 

= 0.14997(0.49135) = 0.07369 
ri3 2 = 0.27146 

^ 93.1 ~ 023 . 1 ^ 32.1 

= 0.05021(0.15358) = 0.007711 
r23 1 = 0.08782 


‘ The coefficients of partial correlation could be checked by using any 
one of several formulas, as follows: 


r.,.;. 


r,, — rnr,t 

Vi - ni. Vi - 




rijj. 


ffu 

&W.I — 
(Ti.i 


|3w-'- 


V 1 - r;L 
Vl - r;,. 


These formulas all have the advantage that they determine the positive 
or negative sign of the partial r; but the partial r always has the same sign 
as its corresponding p. Cf. also p. 460. 



118 S1UD\ Ot liivutiuh'i \Sli MVLin \HHlt6 


Thus IS deternuned the -mthinctical \ vine of the firet-order 
coefRcients of partial correlation Each coefficient of partial 
correlation is positive if the s from which it is derived are both 
positive, negative if the /9 s are negative The rcbpccti\ e pairs of 
0 s involved aie nei er of oppoisite sign 
The b statistics are calculated from the & as follows 


h ,t —0)1 


bu* 


&U1 

bsi 3 


bit 1 


&3l 2 

h 1 


=* 0 80073 
= 0 74948 
« 0 14997 
= 0 0G992 
= 0 80417 
» 0 93020 
« 0 05021 
« 0 02520 
0 49135 
« 1 05380 
= 0 15358 
=» 0 30001 


43 9354 
47 2915 

43 9354 
94 2300 

47 2915 
43 9354 

17 2915 
94 2300 


94 2300 
43 9354 

94 2300 
47 2915 


= 0 80073(0 92903) 
* 0 14997(0 46626) 
= 0 88417(1 07639) 
- 0 05021(0 50187) 
= 0 49135(2 1447) 

= 0 15368(1 9925) 


The first order a statistics ^re calculated as follows 

n t ■= X, - & - bkj'^k 

a it = 217 4074 - 0 74948(204 074) - 0 00992(515 4816) » 28 41o6 

a ,, = 204 074 - 0 93020(217 40740) - 0 02a20(ola 4816) « -11148 

a,t,=5i5iSl6~iGo3SO(2iT4aTi) -G 30001(20^(174) « 223029 

The equations of the three planes of icgression are, therefore 
as follows 


X[ = 28 42 + 0 75X, + 0 OJXt 
Y; = -II 15 + 093^1 + 0025A3 
Y, = 223 93 + 1 05 Y, + 0 31 Yj 



^ analysis of a multivariate EREQUEECY 449 


If Xi is considered the dependent variable, it can be estimated 
from the first equation; if X2 is considered the dependent vari- 
able, it can be estimated fx'om the second equation; if Xz is 
considered the dependent variable, it can be estimated from the 
third equation. The second-order standard deviations, respec- 
tively, about the three plane.s of regression may also be 
calculated. ‘ 

Kik = o-fCl - 4)(i - ni/) 

<^ 1.23 = ‘rlil - - rl.,) 

= 1 , 930 . 3155 ( 0 . 19762 )( 0 . 92631 ) 

0 - 1.23 = 18.7975 

<A.iz = 0-1(1 - rl2)(l - rla.i) 

= 2 , 236 . 4883 ( 0 . 19762 ) ( 0 . 99229 ) 

<^ 2.13 “ 20 . 9-116 

0-5.12 = 0-5(1 - rl 3 )(l - rl 3 .i) 

= 8 , 879 . 2866 ( 0 . 60444 )( 0 . 99229 ) 

0 - 3.12 = 72.9767 

The multiple-correlation coefficients, -which also measure the 
goodness of fit of the planes of regression, may now be calculated 
as follows 

Rhk 
Rizz 

Ri .23 

Rl.U 

Rz.iz 

' They could also be calculated by using Eq. (19), (p. 415). Thus the 
calculation of of.oj could be checked hot onlj' by using 

, <^1.32 ~ risKl “ ris.a) 

but also by using the following formula: 

<^1.23 “ — ^12.3^11 ^13.2<-13) 

2 The calculation of R may be checked by using Eq. (20), p. 417. 


1 _ ^Azz 

O 


O'! 

353.358 

" ^ 1 , 930.3155 

= 0.8170 
= 0.9039 

438.567 

" ^ 2 , 236.4883 

= 0.8039 
= 0.8966 


= 1 - 


= 1 - 0.1830 


= 1 - 0.1961 


= 353.3585 

= 438.5672 

= 5 , 325.616 



450 SFUDY Ob BlVAlUAThS iND MVUIVAltlAlhi^ 


« 0 4002 
12 = 0 6326 


An all round check on the vanous calculations may be obtained 
by using Eq (31), Chap XVI, as follows 
Taslb 55 

I “ + P i-i + 2r»,A/ «3 i , + 


1 - A’n + 1^.1 + 2/„ ,Jui 

(0 80673)* (0 14997)* 2(0 59373) (0 80673) (0 14997) 

10000 = 0 65081 + 0 02249 + 0 14367 + 0 18304 

1 ■ ^ju + ^L» + 2ru ^t) I ^« » + 

(005021)* (086417)* 2(0 02894)(0 05021)(0 86417) 

10000 - 0 00252 + 0 74079 + 0 05458 + 01960S 

1 ” 1 + ^1 f + fftt i (9*J » + 

(0 15358)* (0 49135)* 2(0 89576)(0 16358)(0 4913o) 

10000 - 0023S9 4- 024142 + 0 13519 + 0 69978 

Interpretation of Results Illustrated The mterpietation of 
the above statistics of a tnvariate frequency distribution may be 
illustrated by assuming that it is desired to predict Xi, the 
second semester grades of freshmen at the woman’s college 
selected From the equation for the plane of regression of Xj 
on X 2 and Xj namely, Xi = 28 42 + 0 75X'i + 0 OVXa, esh 
mates may be made of a freshman’s grade in second-semester 
English if her grades m the verbal scholastic aptitude test and 
in first semester English are knoivn 

EsHmates Based on Regression Equation If a freshman’s 
grade in first-semester ij^iish nere 3(K) and her grade m the 
\erbal scholastic aptitude test were 600, her second term English 
grade i\ould be estimated at 

X; = 28 42 + 0 75(300) + 0 07(600) 

« 28 42 + 225 + 42 
= 295 



AN^ILYSIS OF A MULTIVARIATE FREQUENCY 451 


Since the second-term English grade mil, of course, be affected 
by other factors, the student’s actual grade in second-semester 
English will deviate fi'om estimates based upon the I’egressioii 
equation. This raises the question as to how much, on the 
average, it can be expected that estimates based on the regression 
equation will deviate from the actual values. The answer is 
found by the determination of the value of 0 - 1 . 23 , wliich has been 
found above to be 18.8, or approximately 19. The standai’d 
deviation of the differences between the actual grades and esti- 
mates based on the above regression equation is therefore about 
19. If this regression equation and first-order standard devia- 
tion are typical of these college grades and if the differences 
between actual and estimated values are in general normally dis- 
tributed, the chances are about that the actual value in any 
particular case will fall within limits + 38 (= 20 - 1 . 2 . 3 ) from the 
estimated value. 

The foregoing conclusion, which is based on the value of 0 - 1 . 33 , 
can be summarized very succinctlj’^ by the calculation of E 1 . 23 , 
which lias been found to be equal to 0.9039. This is a fairly 
high coefficient of multiple correlation. It shows that the above 
plane of regression is a good fit, and therefore estimates based 
upon it can be e.xpected to be fairly good. 

Parlial-correlalion Cocjficienls. Since both b statistics in the 
equation of regression are positive, it is known that the net 
correlations between Xi and X 2 and between .<Yi and X 3 are posi- 
tive. The amount of the net correlation is given by the coeffi- 
cients of partial correlation 7 - 12.3 = 0.83496 and 7 - 13.2 = 0.27146. 
These show that second-semester English grades are much more 
closely related to fii-st-semester English grades than they are to 
verbal scholastic-aptitude test grades. 

Analysis of Variance in Xi. From the /3“'s and the cross 
products, analysis of the variance in second-semester English 
grades can be made. Thus, from the first set of /3-’s and cross 
products in Table 55, it is seen that 65.1 per cent of the variance 
in second-semester English grades, Xi, is accounted for by direct 
association with first-semester English grades. Only 2.2 per 
cent is accounted for by direct association with verbal scho- 
lastic-aptitude test grades, although 14.4 per cent of the variance 
in second-semester English is accounted for by indirect asso- 
ciation with both first-semester grades and verbal scholastic- 






AiYjiLYSIS OF .1 MULTIVARIATE FREQUENCY 453 



fii>t-ycme-'ter English ^ Collcgo Board English exanunutiou. 



































ANALYSIS Ob' A MULTIVARIATE FREQUENCY 455 


aptitude test grades. The variation in other influences accounts 
for 18.3 per cent of the variance in second-semester English 
grades. 

Under conditions existing at the woman's college studied, it 
appears to be an inevitable conclusion that knowledge of grades 
in verbal scholastic-aptitude tests is not so helpful as might be 
supposed in predicting the subsequent performance of college 
freshmen students. 

Extension of Analysis to Include Four Variables. Additional 
Zero-order Slalislics. The extension of the trivariate frequency 
distribution to include a fourth variable Xi requires first the 
calculation of the mean and standard deviation of the added 
variable. It requires also the calculation of the simple corre- 
lation coefficients between the new variable and each of the 
other three. For illustration, the fourth variable taken is the 
grade in the College Board English examination. Tables 56 to 
58 are the usual work sheets for a correlation problem. From 
them the necessary data are obtained for calculating the addi- 
tional zero-order statistics, as follows: 


Xi = 519.8889 

Tii = 0.49106 

Nffi = 501,201.9828 

)•:, = 0.48807 

crl — 6,187.6/88 

/'si — 0.315ol 

(7( = 78.6618 



Additional First-order Statistics. Among four variables it is 
possible to distinguish four different sets of trivariate frequency 
distributioiLS, each of which will have three planes of regression. 
Accordingly, when four variables are involved the total number 
of first-order statistics is 21, two for each plane of regression. Six 
of these twenty-four were calculated in Table 54; the remaining 
18 may be obtained hy a similar procedure. Table 59 shows the 
24 d's for the illustrated four-variable problem, grouped according 
to the four possible trivariate frequency distributions. 

Each of the four trivariate freciuency distributions could be 
analyzed as illustrated in the i)reccding sections of tliis chapter. 
From the first-order /3’s shown in Table 59 all the other first- 
order statistics may be obtained, by methods already explained. 

In few problems is it necessary or even desirable to calculate 
all 24 first-order j3 statistics of the four trivariate frequency dis- 
tributions involved in a four-variable set. As may be seen from 



^5l> t>lUD\ Of lil\ UillTbb l\D \fULlIV Ua\lls 


T^ble 59 — Thb rniaT order Ph in tub FOch Tnn vriatb FKUji,t\« 
UlSTKIBtmONS lOB FotTB VARr^BEfs 
Dalaonfo r kindt of grades of edUge Sreehmen al the Sihcled 11 oj an* 

College 


1 rat pldne 

1 Seven d plane ! 

1 

Th rd 1 Kne 

Tninnale Dttlnbulton V, 

\ 

\, 

a , , « 0 S0673 

32 3 * 0 S641 7 

1 3m 

i - 0 4gi3d 

3 , 1 « 0 14997 

1 3« I - 0 O 0 O 21 j 

3>< 

, = 0 153o8 

Friiar alt Disinb 1 oti \i 

\, 

Y 4 

d 4*0 80124 

3 1 - 0 Sbio7 

1 

, 0 27260 

3 I 0 0707! 

3 0 063j2 

1 34, 

- 0 24392 

Truar ale Dislnb 1 on \ 


\4 

a , 4 « 0 52642 

1 3i 4 0 C2463 

1 0U 

, 0 48413 

3 , , 0 32498 1 

3i. - 0 00S77 i 

1 3.1 

- 0 01101 

7 mar 

le f>isr-t 6 Ito 4 \ 

\, 

\4 

4 - 0 48S37 

3 4-0 o7724 

1 

, 0 46447 

3 41* 0 33nj 

1 3>4 1 - 0 03378 

3.1 

, - 0 03974 


an examination of Table <50, it is possible to calculate all the 
second oidei ^ statistics if only 18 of the 24 first-oider^S statistics 
are knoi\n If one oiilj of the four planes of regicasion m the 
four \ ai lable con elation problem is significant or important it is 
necessar) to calculate only S of the first older /S statistics 

Second order Slaltshcs tn a Four tai lable Problem In the 
four variable correlation ptoblem statistics for four planes of 
regression maj be obtained Following are the four possible 
legiession equations 

= Ul 234 + bit *4^2 + 6u 24^3 + bu 23^4 

\ 2 = Ua 134 "i" b*i 34X1 + 6js 14 ^ 3 "b bar 13X4 

Yj = 03 124 + 631 J4IY1 + 632 14Y2 + bu 12X4 

X4 =s 04 123 + hij 23Y1 + 64J 13X2 -}■ biZ 12X3 
Also, for each plane of regression a scatter and a coefficient of 
multiple coi relation may be calculated The procedure is 
similai to that aheadj illustiated, that is to say, the second-oider 




457 


A.VALYiilS OP .1 MULTIVARIATE FREQUENCY 


/3 s me first obtained, and from them all the other second-order 
statistics are calculated, table CO iUustrate.s the procedure for 
making the neces.sury calculations to obtain the 12 possible 
second-order statistics. 

Calculalton of Second-order Stulislics. In a problem where the 
first-order partial coefficients of correlation are already calculated, 
it is advisable to modify the formula for finding second-order 
statistics from fin-^t-order jS .statistics as follows: 

According to Eq. (29), Chap. X\‘I, it was found that 




II 


r 1 : 

1 — .5,./.iA,..7; 


But fiom lup (24), Chap. XVI, it is known that 

Accordingly, the formula for finding the second-order /3 statistics 
can be modified a.s follows; 


a 0*J 7. ^tn.k^ni.L 

Pt^Ju ^ "^2 

In order to secure the greatest convenience in calcidation, the 
arrangement of the items in the work sheet (Table Cl) is accord- 
ing to the terms of this formula. Finst the de.sired subscript for 
the 0 statistic to be calculated is entered in column (5); then, 
following the formula, the order in which the reriuired trio of 
first-order /3’s appear in column (1) is determined. If this order 
is followed, the entry in column (2) is the product of the second 
two /3’s of the trio in column (1); the entry in column (3) is 
found by subtracting the entry in column (2) from the first ^ of 
the trio in column (I); the subscript of the third /3 of the ti’io in 
column (i) is the subscript of the partial r for which 1 — r- is to 
be found in appropriate tables or, if preferred, calculated. The 
desired second-order jS’a arc then calculated, by dividing the 
entry in column (3) by the entry in column (4), and entered in 
column (5). 

In problems for which it i.s not desired to calculate the fiist- 
order coefficients of partial correlation, the alternative method 
illustrated in Table 01 maj' be used. It is to be noted that the 
only ditferences are that an additional /3 nuist be entered in 
column (1) in each of the sets and that an additional column, 



•158 STUDY OF BIVARIATES AND MULTIVAIilATES 


Tablb 60 — Calcuiation or the Secoxs-order ^’s frou tbe First 
ORDER /3S 


in which 

0,m k 

{Sec Chap XVI, Eqs (24 and 39)j 


U) 1 


(S) 


(M 



WIkole 

numerator 

SSI 

beeond 

orders 

Subacnpt 

IKi 



Subscript 

“S™* 

12 3 

! 0 S0673 

1 0 15094 

0 65579 

0 34487 

12 34 

0 77620 

14 3 

0 32493 






42 3 

0 46447 






13 2 

0 14997 

0 00231 

0 14716 

0 90866 

13 24 

0 14736 

14 2 

0 07071 






43 2 

0 03974 






14 2 

0 07071 

0 00507 

0 0C564 

0 99866 

14 23 

0 06o73 

13 2 1 

0 14997 1 






34 2 

0 03378 






21 3 

0 80417 

0 10170 

0 70247 

0 84267 

21 34 

0 83362 

24 3 1 

0 33400 1 






41 3 ! 

0 43413 






23 1 

0 03021 

0 00070 

0 01951 

0 99990 

23 14 

0 OlQdl 

24 1 

0 063o2 






43 1 

0 01101 






24 1 

0 0C352 

0 00044 

0 ObSOS 

0 99990 

24 13 

0 06509 

23 1 

0 03021 






34 1 

0 00877 






31 2 

0 49135 

0 00921 

0 48214 

0 9S072 

31 24 

0 49162 

34 2 

0 03378 






41 2 

0 27260 

1 





32 1 

0 15358 

0 00214 

0 15144 

0 98451 

32 14 

0 153S2 

34 1 

0 00S77 






42 1 

0 24332 






34 2 

0 03378 

0 03474 1 

-0 00096 

0 9S072 

31 21 

-0 00098 

31 2 

0 49135 






14 2 

0 07071 






41 2 

0 272G0 

0 01953 1 

0 23307 

0 92631 

41 23 

0 27320 

43 2 

0 03974 






31 2 

0 49135 






42 1 

0 24392 

0 00169 : 

0 24223 

0 99229 

42 13 1 

0 244II 

43 1 

0 01101 






32 1 

0 15358 






43 1 

0 OllOl 

0 01225 

-0 00124 

(1 99229 

43 12 

gmmya 

42 1 

0 24392 






23 1 

0 05021 





1 






ANALY.'il.'i OF A MULTIVARIATE FREQUENCY 459 


T.ible G1. — Caixici-ation of the Second-okder /i’h from the Fibst- 

OBDEIl /3’s 

{AUernative method illmlrated) 

^*1*^ ^in-hAni-k 

1 




i-in 


(1) 

! u*} 

i 

1 

(J) 

(4) 

(5) 

(6) 

Fiptt-orikr ^ 

t Pro<luct 

1 term of 

1 nujnt’rator 

i 

i ! 

[ Whole J 
1 numtr- 
( ator ' 

1 .. 1 

1 

Product 
term of 1 
denomi- 
nator 

1 1 

I 

Whole 
' denonn- ' 
nator 

j 

Second-order ^ 

1 

I 

Sub’script ^ 

^ i 

UcgJts«i»o;i i 
: statistir 

( 

t 

1 

Sub- 

I script j 

Regret 

810U 

statLslic 

12.3 

1 0 . 80G73 ,0.1 5094 ' 0 . C5579 ,0.155133 

' 0 8*1487 

12.3^1 

0.77620 

14 2 1 

0.32198 



1 

‘ 1 



42.3 1 

o.irai? 

\ i 

i 1 

i I 

j 

[ i 

1 


24.3 

0.33100 

! 

[ 1 

i i 

1 

j 


13.2 

0.14997 

1 0 002S1 

'O 11716,0.0013-12 

0.99866 

1 13.24 

0.14736 

11.2 

0.07071; 

i 






43.2 

0.03974 j 

1 

1 

j 

1 1 

i 

1 



3^1.2 

0.0337Si 

i 

i 





14.2 

0.07071 

0.00507 

0 06564 0.001312' 0.99S66i 

11.23 

0.06573 

13.2 

0.14997; 

i 

1 





3^1.2 

0.03378 





i 


43.2 

0.039741 ; 

1 1 , 

! 1 



j 1 



If ihu inethcKl in iiied, the instead of the cuuld be first calculated, uslntC a .similar 
table and the general formula 

&t/ .t 

I — 6«j.&bjsA 


hi,,kn 


column (4), is required in which to enter the product term of the 
denominator. The item in column (5) is then obtained by 
taking the complement of the corresponding entry in column (4), 
The second-order ^ is found by dividing the entry in column (3) 
by the entry in column (5). For convenience of arrangement, 
the product term of the numerator is Avritten in the order 
P.nA^nf.L rathei* than and the product term of the 

denominator is arranged in the order p„ij0,n.i. rather than 
Except for the convenience in arrangement of the 
work sheet, the order in which such product terms occur is 
immaterial; but, when arranged as indicated, once the subscript 
of f^e desired second-order /3 is entered in column (6), the order 
in which the first-order /3’s occur in the equation may be followed 
in entering them in column (1). There are only four first-order 



IbO SHn>Y or niVAlUAJLS AXU MULTIVARIAJ'L^ 


^’h in each bct, for the third (m tho numerator) is repeated m the 
fii-st part of the pro<Iuct teim of the denominator When this 
pioccduie lib to nrranjjenient m tlie w'ork sheet is followed, the 
entr>' in column (2) is al\va>B tho pioduct of the two middle 0’s 
in the set of four m column (1), and the entry in column (4) Is 
alwajs the jiroduct of the last two 0't, entered in column (1) 
The bctond-oider totHicients of partial correlation are caJ- 
( ulated from the hctond-oidtr /S’s as follows ' 

or, for tlie fonr'\ unable case, 

r*, „ = 0 77020(0 83362) * 00 17056 
n, « « 0 80110 

rj, s, - 0 1 1736(0 10162) « 0 072445 

ruji = 0 26010 

0 00573(0 27320) « 0 017057 
ruu a 0 13100 

rl ,j = 0 06300(0 21411) » 0 015101 
m u a 0 12410 

ri,,4 = 0 01051(0 15382) « 0 007616 
full = 0 08728 

r,*,„= -0 00098( -0 00125) = 0 000001225 

ri, ,1 » -0 00111 

(The iicgalue ‘•ign of tlie iMitiul r it dttcrnmicd hy the negatue 
sign of till’ cont>poiuhng 0 btatiblic ) 

'JJic i btalistirs of the •^coiid order are calculated fiom the 
bccond-oider 0’s in the Kimc wa\ as the fint-ordcr h’s from the 
first-ordei 0’s, bv the foiimiH 



oi, fui (he foui*\aiuhie piuhicm, 



hll J4 — 0u J4 ~ 


‘ tor clKckiiiR or nUrmatne formulas to find the pnrtiil coefficients of 
lorrUatiun, sop p t47 



461 


AiVALYSl.'i OF A MULTlVAltlATE FREQUENCY 

= 0.77620(0.92903) 

= 0.72111 
= 0.14736(0.46626) 

== 0.06871 

hu.-ii = 0.06573 = 0.06573(0.55854) 

= 0.03671 

bn.n = 0.83302(1.07639) 

= 0.89730 

6-3.U == 0.04951(0.50187) 

= 0.02485 

- 0.06309 = 0.06309(0.60120) 

= 0.03793 

53.. .3 = -0.00098 = -0.00098(1.19791) 

= -0.00117 

53. . 3, = 0.49162(2.1447) 

= 1.05438 

533.1, = 0.15382(1.9925) 

= 0.30650 

5,1.33 = 0.27320 =-0.27320(1.79040) 

= 0.48914 

5, -..13 = 0.24411 = 0.24411(1.66334) 

= 0.40604 

5,3.13 = -0.00125 = -0.00125(0.83478) 

= -0.00104 

It ivill be noted that, with the e.xception of tho.se in\-o]ving at, 
tlie standard-deviation ratios used in the above calculations 
have all Ircen computed and may be copied from the preceding 
.section, where the first-order h’a were calculated from the first- 
order I3’h. 

The second-order a statistics are calculated as follows: 

— * 5,/..,rtJV/, — n 

a, .33, = 217.4074 - 0.72111(204.074) - 0.06871(515.4816) 

- 0.03671 (549.8889) 

= 14.642-14 



402 STUDY OF BJVAEIATES AND MULTIVARIATES 

at in = 204 074 - 0 89730(217 4074) - 0 02485(515 4816) 

- 0 03793(549 8889) 

= -24 67260 

«a = 515 48IG - 1 05438(217 4074) - 0 30650(204 074) 

+ 0 00117(549 8889) 

= 224 34628 

a, = ^9 8889 - 0 40604(204 074) + 0 00104(515 4816) 

- 0 48914(217 4074) 

= 361 22013 

The equations for the four planes of regression may now be 
written as follows 

X[ = 14 64 + 0 721A’, + 0 069X, + 0 037X* 

X{ = -24 67 + 0 897X, + 0 025X, + 0 OSSXi 
Xi « 224 35 + 1 05Ai + 0 306Xj - 0 OOI 2 X 4 
X\ = 301 22 + 0 489A', + 0 406X* - 0 OOlX, 

If Xi la considered the dependent variable, it can be estimated 
from the first equation , if Xj is considered the dependent \ anable, 
it can be estimated from the second equation, if Xt is considered 
the dependent variable, it can be estimated from the third 
equation, if Xi is considered the dependent \ariable, it can be 
estimated from the fourth equation Tho standard errors of 
estimate, that is, the scatters, respectively, about the four 
planes of regression may also be calculated ^ 

<^1 IJ4 — m( 1 “ *'l4 Sj) 

= 353 34(0 9S204) 

« 340 9940 

ffi sj* = 18 628 

134 — u(l ~ rl* ij) 

= 438 53(0 98460) 

= 431 7766 
134 — 20 779 

*>3 124 = <^3 12(1 ” f3i If) 

= 5,325 56(1 00000) 

= 5,325 56 
ffj 1J4 = 72 976 

‘ For alternative methods see p 415 and Eq (19), Chap "VVI 



ANALYSIS OF A MULTIVARIATE FREQUENCY 403 


‘’■ 4.123 — ~ ’■ 43 . 12 ) 

= 4,622.8210(1.00000) 

= 4,622.8210 
<’' 4.123 — 67.991 

The multiple-coiTelation coefficients, which measure the good- 
ness of fit of the planes of regression, are calculated in the same 
way as for the trivariate problem, namely,’ 




■^ 1.23 


fil.234 



346.994 

1,930.3155 

0.8202 

0.9056 

431.7766 

2,236.4883 


= 0.8069 
f^2.i34 ~ 0.8983 




f23.124 

•S4.123 


R 


4.123 


5,325.56 

8,879.2866 

0.4002 

0.6326 

4,622.8210 
^ 6,187.6788 

0.2529 
0.5029 


1 - 0.17976 


1 - 0.19306 


1 - 0.5998 


1 - 0.74710 


For the four-variable problem, the equation for the /3 squares 
and /3 cross products is as follows: 


A/.tii 4" Pik.i'n "h l^in.ik “I" -f" 




In Table 62 some of these checks are illustrated. 

Interpretation of Results Illustrated. The interpretation of 
the above statistics of a four-variable frequency distributior 
may be illustrated by assuming that it is desired to predict th( 
second-seniester English grades of freshmen at the woman’u 
college selected; in other ivords, the Xi is assumed to be the 

’ For an alternative method, see Eq. (20), Chap. XVI. 



464 STUDY OP BlVARlAlhS \ND MLLi IVARl iTBS 



Q 07463 + 0 05900 + 0 000002 + 0 I194S + (—0 00043} + (~0 00036) -f 0 74710 



465 


ANALYSIS OF A MULTIVARIATE FREQUENCY 

dependent variable. From the equation for the plane of regres- 
sion of .Yi on Xi, Xi, and Xi, namely, 

X'l = 14.64 4- 0.721X2 + 0.069X3 + 0.037X4 

estimates may be made of a freshman’s grade in second-semester 
English if her grades in the verbal scholastic-aptitude test, in 
College Board English, and in the first-semester freshman 
English course are knoum. 

Estimates Based on Regression Equation. If a freshman’s 
grade in first-semester English is 300, in the verbal scholastic- 
aptitude test 600, and in College Board English 500, her second- 
semester English grade is estimated as follows: 

X'l = 14.64 -t- 0.721(300) 4- 0.069(600) 4- 0.037(500) 

= 14.64 4- 216.3 4- 41.4 4- 18.5 
= 291 

Since the second-semester English grade will, of course, be 
affected by other factors, the student’s actual grade in second- 
_ semester English mil deviate from estimates based upon the 
regression equation. This raises the question as to how much 
on the average it can be expected that estimates based on the 
regression equation will debate from the actual values. The 
answer is found by the determination of the value of 0-1.231, 
which has been found above to be 18.6, or appro.ximately 19. 
The standard deviation of the differences between the actual 
and the estimated grades in second-semester English is therefore 
about 19. If this regression equation and second-order standard 
deviation are typical of these college grades and if the differences 
between actual and estimated values are in general normally 
distributed, the chances are about xiiV that the actual value in a 
particular case will be -within limits ±38(= 20-1.234) from the 
estimated value. 

The foregoing conclusion, which is based on the value of 
0-1.234, can be summarized very succinctly by the calculation of 
Ei. 234, which has been found to be equal to 0.9056. 

If this result is compared with the estimate based on only two 
independent variables, it is found that the standard error of 
estimate is almost as large for the plane based on three independ- 
ent variables as the standard error of estimate laased on two 



10(j SrVDY OF BIVAHIATLS \ND MULl IVARt iTEi> 

independent variables ‘ In other words \ ery little increase in 
accuracy was obtained by including the fourth \anablc into 
the correlation problem This same conclusion is borne out 
by comparing the coefficients of multiple correlation Thus 
R\ 234 = 0 9056 while ^i s* = 0 9039 which is nearlj as large 
indicating that the trivanate plane was nearly as good a fit as 
the four variable plane of regression 
Partial correlation Coefficients The unimportance of kno vl 
edge of grades in College Board English examinations in pre- 
dicting the grades of freshmen in second semester English is 
explained also by the small partial correlation coefficient between 
Yv and Xi ivhea Xt and Y* are held constant This partial 
correlation coefficient is given as ri 4 *3 = 0 1340 
^Ino^yats of Variance tn Xi These conclusions are furilier 
indicated by the nature of the 0 squar«j and the 0 cross product 
terms From the first equation m Table 62 it is seen that the 
various proportions of variance m Xj are accounted for as 
follows 

60 25 pel cent by correlation mth first semester English grades 
2 17 per cent by correlation with verbal scholastic aptitude 
tests 

0 43 per cent by coi relation with College Boaid English cxami 
nations 

13 58 per cent by indirect correlation with first-semester h nglish 
grades and verbal scholastic aptitude tests 
4 98 per cent by indirect correlation with first semester English 
grades and College Board English examinations 
0 61 per cent by indirect correlation with verbal scholastic 
aptitude tests and College Board English examinations 
17 98 per cent by correlation with other factors independent of 
first semester En^ish grades verbal scholastic aptitude 
test grades and College Boaid English examinations 

The small percentages attnbutable to College Board English 
examination grades either directly or indirectly are apparent 
from these statistics Evidently under conditions existing at 
the womans college grades on the College Boaid English 


‘ Cf p 451 



■LVylL}'>b7,S' OF A MULTIVARIATE FREQUENCY 


JOT 


examination were of little value for predicting how well the 
students would do in their college freshman English courses.’ 

Another approach to the study of variance in Xi could be 
made as follows : It was noted above for three variables that- 

^ "f" ~i~ ^1,23 

For four variable.s, 

2 2 2 _T_ 2 2 I 2 2 i *• 

(Tj *12^1 i ^ 13.2^1,2 * ^1-1.23^1.23 *T” <^1.234 

which may be expressed in proportions as follows: 



+ <» 

n 


or 


+ 


•> 

^ i.23( 

0 -f 


This expression means that the total variance in Xi is com- 
posed of four parts as follows: the part that is due to total simple 
linear correlation w-ith X^, the part that is due to partial correla- 
tion with Xi when X^ is held constant, the part that is due to 
partial correlation nith Xi when ^”2 and -Y3 are held constant, 
and the part due to other causes independent of Xi, Xi, and A%. 

g.2 ^ 

The expression j-j, describes the proportion- of the variance 
<^1 

in Xi that is explained as a result of adding Xi to the regression 
2 

equation, while describes the proportion of the variance 

<7i 

in Xi that is explained as a result of adding Xf to the regression 
equation; the influences of Xi and ^"4 that result from their 
association with Xi are already contained in lioO’t By sub- 
stituting the values of the four above terms in the illustrated 
problem, it becomes 

1.00000 = 0.80238 + 0.07.3b9 ^ ^30.3 155 + O.Ol/Oo/ j fjg^55 

346.9940 
' 1,930.3155 

or 

1.00000 = 0.80238 + 0.01456 + 0.00329 -j- 0.17977 


1 It will be noted, liowever, that r,t = 0.49 so that approximately 2o per 
cent [ = (.49)^1 of the variation in .Yi may be estimated from knowledge of .Y , 


alone. 

- Cf. Chap. XVI, Eq. (33), p. 42S. 



408 STUDY Oh lilVAltlAlhS iSD MVLIIV AUlAlh S 

From this expression it may be said that 80 2 per cent of the 
\ariaiice m Xi is accounted for by total conchtion ^Mth Xj, a 
furthei 1 4 per cent is accounted for by additional correlation 
Mith Xz, and a further 0 3 pei cent is accounted for bj additional 
correlation with Xi, the remaining 18 per cent being due to other 
influences independent of Aj, Xj, and X4 In other Mords by 
making a four instead of a three-vanable eon elation pioblem, 
that IS, bt including tlic College Board English examination 
grades, only an additional 0 3 per cent of the vanance in second 
semester English giades is explained 



CHAPTER XVni 
NORMAL FREQUENCY SURFACE 
THE BIVARIATE HISTOGRAM 

The study of frequency' surfaces begins logically \nth a geo- 
metrical representation of a bivariate frequency distribution 
knotvn as a "bivariate histogram.” To visualize the histogram 
that would represent the distribution of Table 25 (page 326), 
consider an ordinary checkerboard. Let the side and top of 
the board be calibrated trith the cla.ss-interval scale .shown in 
Table 25, and let 81 checkers be taken to repre.~ent the 81 
students. On the checkerboard square in the row headed 60- 
and the column headed 120-, let one checker be placed; on the 
square in the row headed 100- and the column headed 00-, let 
two checkers be placed; on the square in the row headed 100- and 
the column headed 100-, let one checker be placed; and so on, 
until all the squares on the checkerboard for which there are 
freciuencies in Table 25 are covered with the proper number of 
checkers piled on top of each other. 

If the checkers were square rather than round, they would 
stand up better and fill in all the area, helping to support t-ucli 
other. If the}’ were square, the resulting figure would resemble 
a histogram for the given bivariate frequency distribution. A 
picture of what such a histogram tvoiild look like is given in 
Fig. 121. 

In the foregoing example the heights of the various piles of 
checkers represented the frequency of each cell. It would be 
possible however, so to adjust the vertical scale that the heights 
of the piles of checkem represented the relative frequency of 
each cell. If the checkers were square, giving a histogram 
proper, then, further, it would be possible to adjust the vertical 
scale so that the volume of each pile of square checkers measured 
the relative frequencies. For e.xample, .since the cla.'.s intervaL 
are 20 units each and the area of any cell Is thus 400 square 
units, the height of a pile of checkers taken to measure a rela- 

469 



470 .SJ UDY OF lilVAlilATBS AND MULTIVARIAlhS 


tive frequency of, say 0 08, would be 0 0002 unit This ivoiild, 
of course, be \ery small, but then, m any model, the vertical 
umt could be taken sufficiently lai^e'Ho offset this That is, 
instead of letting i inch represent 1 umt (the thickness of one 
checker, say), it would be possible to let 10,000 inches represent 
1 unit Then 0 0002 units would be the equivalent of a pile of 
eight checkeis 



Tio 121 — Histogram rspr^seutalion ot a bivsMste frequency distnbuUon. 
Rectangular blocks on the other side of the mean point arc presumably obscured 
from view 


"buppose, now, t'na't a’nis'togram is constructed so t’nat i o'lumes 
of the square checkers erected on each cell represent the relatn e 
frequency of that cell, and suppose that the number of cases is 
indefinitely mcreased and at the same time the size of the class 
intervals is made infinitesimally small The result would be a 
solid figure the top of which would tend to trace out a smooth 
surface This would be a frequency surface A frequency sur 


NORMAL FREQUENCY SURFACE 


-171 


face is thus the limit approached by a bivariate histogram as the 
number of cases is indefinitely increased and the sizes of the chiss 
intervals indefinitely reduced. If an area is traced out in the 
XiXi plane, the relative frequency of cases falling in this area is 
given by the volume under the surface over that area. 

FREQUENCY SURFACES 

Frequency surfaces maj' assume all sorts of shapes. They may 
be symmetrical and bell-shaped, or the 3 '^ may be distorted bj' 
ske'nmess or e.xce.ssive peakedness or flatne.ss, depending on the 
types of forces underljdng the A'ariation in the two variables. 
First Avail be considered the case of a bivariate surface for variables 
that are normallj' distributed and are independent of each other. 

Bivariate-surface, Independent Variables. A monovariate 
frequency distribution, it will be recalled, showed the relative 
frequency of occurrence of various Amlues of a given variable. A 
joint, or bivariate, frequoncj' distribution shows the relative 
frequency of occurrence of A'arious paii's of values of the two 
given variables. Suppose, for e.xample, that a marksman is 
shooting at a target. The scatter of dots about the center of 
Fig. 122 may be taken to illustrate the results of a large number 
of such shots. The position of an}'’ particular shot relative to 
the center of the target maj' be indicated bj' the amount of its 
horizontal deflection (call it and by the amount of its A'ertical 
deflection (call it .tq). The relativ'e frequencies of A'arious tj'pes 
of shots may consequentlj’^ be indicated by the relatiA’’e fre- 
quencies of various combinations of horizontal and vertical 
deflections, that is to sajq of various pairs of Amlues of Xi and xj. 

The relative frequencj- of shots in anj' giv^en area of the target, 
the X 1 X 2 plane shown in Fig. 122, maA'- be indicated by the density 
of shots in that area or by the volume of some frequenc}’- surface 
constructed OA^er the XiXj plane. The use of the surface for this 
purpose is illustrated in Fig. 123. 

It'Avull be noted that the shots tend to be distributed sjun- 
metrically around the center of the target. Xo tendency for 
large vertical deviations to be a.ssociated with large horizontal 
deviations in either a positive or a negath’e direction is CAudent.. 
Also, no tendency for vertical deviations to vary in any par- 
ticular Avay with horizoiital dcAuations is apparent. 




and the example de&cnbed above iHustiafe the characterifetics 
of a bivariate di&tnbution where theie is no correlation between 




NORMAL RURQUJSXOY SURFACE 


473 


the two variable.^. Tliis may be .summarized as follows; There 
Ls no correlation, that Ls.to say, the variables arc independent of 
each other, because (1) for any given value of Xi, the distribu- 
tion of values of X 2 is the same, with the same mean and standard 
deviation, as for anj' other value of Xr, (2) for any given value 
of X 2 , the distribution of value.s of is the same, with the .same 
mean and standard deviation, as for any other value of A'j. 
When each variable is the result of a set of forces that will produce 
a normal frequency distribution in that variable alone and when 
the two sets of forces operate independently of each other, the 
result Avill be a normal bivariate frequency distribution with no 
correlation. The easiest way of generating a iionnal bivariate 
frequenc}^ surface is to suppose that a fonn of the normal frequency 
curve is held in a position perpendicular to the base plane, as in 
Fig. 124. A knob is fixed to the top of the fretiuencj' curve at 
B, and the center of the base 
line of the frequency cuiwc is 
fixed at A, so that it can revolve 
but so that the line BA always 
remains perpendicular to the 
base plane CD. 

If the form of this normal 
frequency curve is revolved in a 
complete cycle until it reaches 
its oi'iginal position again, the c 
frequency curve will “describe” 
the surface of the bivariate 
normal frequency' surface for iir- 
dependent variables, and it will 
be like a S3'stem of symmetricall}* concentric circles such as 
that shown in Fig. 123. For such a distribution of pairs of 
observations and X 2 , r = 0, for the XiXi products arc dis- 
tribirted equallj' in the foitr quadrants, minus products canceling 
plus products. 

Mathematical Representation of Normal Bivariate-surface, 
Independent Variables. Use of the Xi and X 2 as the Origin. As 
noted in the discussion of Fig. 122, the various AjA'- points 
plotted in a bivariate plane maj*, with no difficulty, be described 
in term.s of their distances from the respective nrean.s. 'I’his has 
the effect of .shifting the axes so that the. new a.xe.s are thii Unes 



Vio. rat. — The noriiiiil curve, 
revolution of which uill proclmi. 
Fit'. 12'5. 




474 iilVU-i Oh lil\ UtliJlti lV/> \WLH\ARlAThiy 


dra^\n perpendicular to tho means of the respectuo scales, the 
vertical line drawn through A* in Fig 122 is the ii-axis, and the 
horizontal line draivn througli Ai m Fig 122 is the x-axis 
Vertical and horizontal deviations from the center of the circle 
arc xi and Xj variates For man) purpo-es it is more convenient 
to use this method of describing points in a biv ariate plane than 
to use the onginal scales as the pomt of i efcrence In the folloi\ 
mg pages, the moic ficquent appearance of xi and Xs, instead of 
the capital letters, will be understood to signifj the shift from 
reference to the onginal axes to icference to the axes with the 
origin at the means of the two variables* 

Prohabxhiy of Each Vanaie Taken Separately If Xi is a nor 
mally distributed variate above and below the A'l and is com 
pletely independent of xt, the probability or relativ e frequenej of 
any value of xi between Xi and Xi ■+■ dxi, whether associated 
with largo or with small values or with positive or negative 
values of Xt, will be given b) 


dP(ii) » — ^*’dxi (1) 

VI v2x ' 

Similarly, if xs is a normally distributed vanate and is completcl) 
independent of Xi, the probability or relativ o frequency of any 
valvic of Xj between Xj and xj + dxj, whether associated with 
large or with small values of xi or with positive or negative 
V alues of Xi will be giv cn b) 


rfPfe) - — L c'i'vi. (2) 

fft v2»’ 

Joint Probability of Ttio VariMcs The joint probabiht) oi 
joint relative frequency of an Xi between Xi and Xi + dxi occur 
nng m association with an xj between x* and xj + dxj is the 
product of the above two probabilities In other words, the 
joint probability of the two variables occurnng m pairs of an\ 
combination is given bj 


dP(xix.) = 


I -s/Svi v^ir 


** dxi dXi 


( 3 ) 



NORMAL FREQUENCY SURFACE 


■175 


which reduces to the following form: 


dP{zix«) 


1 

ff 10-22 JT 



dxidxi , 


(• 1 ) 



Fig. 125.— .V nonual bivariuto frequency surface, mdependeut variables. [Here 

oi > os]. 


Geometrically, the dP{xiXz) e.\pre.ssed in Eq. (4) describes the 
volume of a column with breadth and width of dx^ and dx^ and a 

height equal to ^ e Such a column is shown at 

fflCTniff 

P in Fig. 125. 

The normal bivariate surface may be described, therefore, 
as follows: 


S(X\X-^ 


1 

0'i0’223' 



(5) 


If the two standard deviations are equal, the normal prob- 
ability surface is circular like Fig. 123. Horizontal planes 
parallel to the base will intersect the figure in the form of circles 


476 6rLO\ Oh BI\ IW/IJ/S l\D MVL7I\ lif/ IJii 

becoming smaller as the plane is elevated Any vertical piano 
parallel with the Xi-a\is (aline through the Xi) will intellect thi 
figure in the form of a normal 
curve with a standard deviation 
equal to <ti, and anj vertical 
plane parallel mth the Xj-axis will 
intersect the figure m the form of 
a normal curve with a standard 
dev lation equal to <rs If the two 
standard deviations are equal 
these normal curves will be 
identical 

If, however, the two standaid 
dev lations are not equal, the 
normal bivanate surface vnlJ be 
elliptical m form, as shown in 
Fig 12o, rather than circular 
Vertical planes drawm as before 
will nevertheless bisect contours 
of normal cuives Ihe vertical 
normal curves will hav c standard 
deviations equal to <71, and the 
honzontal normal curves will have standaid deviations equal to 
Honzontal jilancs parallel to the base in Fig 125 will inter 
sect the figure m the form of ellipses, winch will become smaller 
as the plane is elevated Figure 120 is the sort of elhpse that 
would be obtained by the intersection of a plane honzontal to the 
base plane of Fig 125 The equation for the ellipse shown m 
Fig 126 IS 

0 + 6 7x1 - 32 « 0 

01 

X. - ± 

Pairs of Xi and xo that satisfj this equation are 

Xt Xi 

0 ±10 3 

±0 5 ±10 0 

±10 ±92 

±l a ±75 

±2 0 ±42 

±2 18 0 



tl « frequency su fnee of Fie 12S 



normal frequency surface 


477 


Bivariate-surface, Correlated Variables. Instead of two inde- 
pendent variables, suppose there is a set of paired variables in 
which is displayed a marked tendency for positive correlation, so 
that large values of are associated with large values of Xi, 
and vice versa. This is the same as to say that positive values 



of xi occur predominantly with positive values of and negative 
values of Xi occur predominantlj’’ with negative values of Xi, the 
small x’s measuring in each case the deviations from respective 
means. Assume that each distribution taken separately is a 
symmetrical one like a in Fig. 127 and a in Fig. 128. In Fig. 




478 SiVDY Ob DIVARIATFS WD MUI^TIVARI \TBS 


127 let a repicsent the frequency curve of the total distnbution 
of the Xi \ enable Then suppose this frequency distribution of 
all the vanants of the variable Xi is cross-classified into three 
groups, (I) those XtS associated wth large values of Xi, (2) 
those associated ivith the ordinary or average range of values of 
Vi and (3) those associated 'uith small ^ alues of Xi 



Ihe plane is accordingly divided vertically into three parts 
representing the range of ( 1 ) large t alues of Xi (this part of the 
plane is labeled 13 in Fig 127), (2) ordinary or average lange of 
Xi values, represented in the figure by 7 , and (3) small values of 
Xi, represented by 3 in the figure 

By summanzmg m a gioup those variates of Y 2 associated ^vith 
large t alues of (those m the range of in Fig 127), and under 
the assumption that large values of Ys are associated uith large 
1 alues of Xi, a frequency distnbution like h, whose mean uould 



NORMAL FREQUENCY SURFACE 


479 


be larger tban the of the total population of variables X 2 , 
would be obtained. The line AA' intersects the base of the 
frequency curve b at its mean point. 

By summarizing in a group the Z2 variables in the y range of 
Fig. 127, a frequency distribution of Xs variables like c would be 
obtained; then the one showing the X 2 variables associated with 
Xi in the range of 6 would give a frequency distribution like d. 
The line AA' in Fig. 127 also passes through the mean of the 
frequency curve d. In other words, the means of curves 6, c, and 
d, all lie on the same straight line, AA'. 

The Xi variable is treated in a similar manner in Fig. 128, in 
which a represents the frequencj'- curve of all of the values of the 
Xi variable. This frequency distribution of all the Xi variables 
is then cross-classified into three groups, (1) those associated 
with small values of Xo, (2) those associated unth ordinary or 
average range of values of X 2 , and (3) those associated with large 
values of X 2 . The plane of Fig. 128 is accordingly divided hori- 
zontally into three parts, representing the range of (1) small 
values of X 2 (this part of the plane is labeled p in Fig. 128) ; (2) 
ordinary or average range of X 2 values, represented in the figure 
by 7; and (3) large values of Xi, represented by 6 in the figure. 
By summarizing in one frequency distribution the variates of Xi 
associated -wnth small values of X« (those in the range of p in 
Fig. 128), under the assumption that small values of Zi are 
associated mth small values of X 2 , a frequency distribution like 
b, whose mean is smaller than the mean of the total population 
of variable Xi, would be obtained. 

B}'’ summarizing in one group the Xi variables in the range 7 
of the X2 variable, a frequency distribution of Xi variables like 
c would be obtained; the group of Xi variables associated -with 
X2 in the range of 5 will give a frequency distribution like d. 
The line passing through the means of these three frequency 
distributions would be like BB' in Fig. 128. 

Normal Correlation Surface, Correlated Variables.- A bivariate 
frequency distribution showing the joint variation of two cor- 
related variables would thus appear to be represented by a 
frequency surface that is turned so as to make an angle with the 
xi- and xo-axes. A picture of a normal bivariate frequency sur- 
face for correlated variables is shoAvn in Fig. 129. Figures 127 
and 128 constitute analyses of the frequencies of Fig. 129 that 



480 hrVDY OF BlVARIATbS i\D MVLTIVARI VTLS 


divided the surface into three paits, first up and down and second 
left and right The thiee figuica, thcrefoie, are an attempt to 
Mew the same distnbution m three different ways ]f any cross 
section IS taken of the surface represented by Fig 129, paralJel to 
the Xi-axis, the cross section will have the form of a normal 
frequency curve with its mean on the line bb Any cross section 



iiG 129 ~A normal bkvanate frequency burfacc correlated variables 


of this surface taken parallel to the \i axis will ba\e the form of 
a normal fiequency cm\e with its mean on the line oa' Such 
cross sections are similai m chaiactei to the fiequencj cur\es 
b, c, and d, discussed m connection tvith Figs 127 and 128, 
respectively Typical cross sections are likewise shown in 
Fig 129 

Caieful study of Figs 127 to 129 will aid greatly in the under- 
‘'tandmg of the theory of eonelation Thev serte also as the 



NORMAL FREQUENCY SURFACE 


481 


basis for compi'ehending the theoretical explanation in the ensuing 
section. 

Derivation of Equation for Bivariate Normal Frequency Dis- 
tribution, Correlated Variables. Equation of a Rotated Ellipse. 
A quadratic equation of the general form 


aXi + 211 X 1 X 2 + bXo + 2gXi + 2 /A 2 + c = 0 ( 6 ) 


is an ellipse under the following 
conditions:^ 


ab — > 0 and D ^ 0 

where 


D = 


a h g 
h h f 
g S c 


= abc + hgf + gfh 

— aP — ch- — bg~ 


For example, the equation 

XI - 4X1X2 + QXl - 24 Xi 

+ 64 X 2 + 144 = 0 (60 

is an ellipse like that shown in Fig. 
130, expressed with reference to the 
large XiX 2 -axes. The center of the 
ellipse is at Xi = 4, X 2 = — 4.- 
The equation for an ellipse with 
reference to the axes passing thi-ough 
its center is* 



Pig. 130. — A horizontal 
cross section, of a normal bi- 
variate surface, correlated 
variables. 


a'xi + 2h'xiX2 -h b'xl + c' = 0 (7) 

where a' = a, h' — h, b' = b, and c' = D/(ab — h-). 


For Eq. (60 the new form is 

x^i - 4xiX 2 + Qxl-Z2 = 0 (70 

^ Fine, H. B., and H. D. Thompson, Coordinate Geometry, pp. 137-138. 

= The center of the ellipse is found by solving the following two equations 
for Xi and A;: 

aAi 4- / 1 A 2 4" H — 0 
hXi +iX2+f = 0 

In this problem, a = \, h = —2, g = —\2, h — 6, and / = 32. 

^ Cf. Fine and Thompson, op. cil. 



182 STVDY Oh nn iRIUES MOLflY IRtATCi, 


Folloinng are the solutions foi Eqs (O') and (7 ), from \vluch 
Tig 130 was drawn, the two equations descnbmg the same 
ellipse 

Lquatiq'i (6 ) Eg0A'^o^ (7 ) 

Solution Solulton 

A, = 2A, + 12 + V-2(XJ +8A,) * = 2ar, ± \/'32~- 2i\ 

A, Y Zt T 

0 12 ± - 12 0 0 ± = 5 7-07 

-1 10 ± VU » 13 74 6 3 ±1 ± 2 ± Vao =>±7 5 73 5 

-2 8 ± V24 = 12 9 3 1 ±2 ± 4 + \/24 =±89709 

-3 6 + = lla 05 ±3 ±6± Vn = ±9 7 ± 2 3 

-4 4 ± VM = 9 7 -17 ±3 S ± 7 ± \/T5 = ±9 7 ± 4 3 

-5 2 + = 7o -3 5 ±4 ± 8 ± = ±8 

The equations of the axes of the ellipse are obtained finding 
the positive root of X m the following equations 

/i X* + (o - 6 )X - h' = 0 

or, m this case, 

-2X* - 5X + 2 = 0 

X » 2 85 

The equation for the major axis of the ellipse is theieforo 
11 * 2 85zj, and the equation for the minor axis is » — 2 85±, 
Referred to its own major and minor axes, the equation < f the 
ellipse IS Axl + Bxl + C" 0, ivhere A and B are obtained from 

A + B ~ a' +b' AB = a 6' — A * C = c' 

and the condition that A — B has the same sign as k’ For this 
ellipse it is thus found that A = 0 3 and B = 6 7 The eqiia 
tion for this ellipse referred to its own axes (see Fig 125) is 

0 3xf + 6 7x| - 32 = 0 

Mathematical Representation of a Bwanate Normal Correlaiton 
Surface It was noted above the bivanate normal surface 
in which Xi and xj are independent of each other (that is in 
which no correlation exists between them) is of the form 




NORMAL FREQUENCY SURFACE 


483. 


The constant term, l/ 27 rffi<r 2 , is a constant dependent on the 
values of the two standard deviations in any particular instance. 

_l/£^ ,£2f\ 

The product of this constant times the term e gives, 

for various values of Xi and Xo, the height of the bivariate surface 
from the base (the distance OP in Fig. 1 25) . If a horizontal plane 
parallel with the base plane is drawn through the normal bivariate 
surface at a distance OP from the base plane, the intersection of 
the plane and the bivariate .surface will be an ellipse (as in Fig. 
126) If the standard deviations are unequal; the intersection will 
be a circle (as in Fig. 123) if the two standard deviations are equal. 
Such a plane represents the locus of all points distant OP from the 
base plane, and the passing of such a horizontal plane through 
the bivariate surface is equivalent to setting the expression 
_ 1(£>2 -1. £1’) 

e - <r:V equal to a constant which is equivalent to putting 



This equation represents a circle if O'! = p’s and an elHpse if 

ffl 0-2- 

The smaller the constant c, the smaller will be the circle or 
ellipse until, at the peak of the bivariate surface a very small 
circle or ellipse "will be found — finally, just a point. 

If the two variables are correlated, two changes occur. (1) 
The ellipse is rotated. (2) The ellipse is narrowed. If before 
correlation the surface is circular in form, owing to the fact that 
the standard deviations are equal, the existence of correlation 
will cause the circle to be converted into a rotated ellipse, narrow- 
ing the ch’cle to an elliptical form. If before correlation the 
surface is elliptical in form, oudng to the fact that the standard 
deviations are unequal (see Fig. 126), the existence of correlation 
A\’ill cause the ellipse to rotate and also to become narrower. 
This phenomenon is explained as follows: 

If larger than average values of Xi cause Xi to be larger than 
average and smaller than average values of Xi cause X 2 to be 
smaller than average, the pull exerted on X 2 values is indicated 
by the arrows in Fig. 131. The larger the Xi, the more pull 
wll be e.xercised upon X 2 to make it larger than its average. 
This is indicated by making arrow (i) longer than arrows (2), 
(3), and (4), which, respectively, represent the degree to which 



484 STUDY Of BIVARIATLS AA'D ^fUL^/l AfiHfES 


successively smaller values of Xi affect \alues of Xt, until, by 
+hc time Xi becomes smaller than average (below the line Xi), 
arrow (4') points to the negative pull, that is, causing A'l to be 
less than its a\ erage 

When correlation exists, this means tliat bivariate frequencies 
located m quadrant 11, where Yi is larger and Ai is smaller 
than average, tend to move over to quadrant I, where X'i and 
Xi are both larger than average Bivariate frequencies already 
located m quadrant I are less affected Similarlj, bivanate 
frequencies m quadrant lY tend to move to quadrant III, where 



4iu 131 — Illustrating tlie differrnre between tl e nonexistence and existence uf 
correlation in a normal bivariate frequency surface 


both Xs and Xi are smaller than average, wlule bivanate fre 
quencies in quadrant III are less affected The result is that 
the rotated ellipse becomes narrowed as shown m the part of 
Fig 131 at the right Any horizontal plane piiallel to the base 
of a correJated bivaj^aie (Fjg J2D) vnlJ jntwsect the bjvanatc 
frequency surface in the form of an ellipse such as that shown in 
the right half of Fig 131 — lai^e ellipsei near the base plane, and 
smaller and smaller ellipses as the honzontaL plane is raised 
higher and higher from the base These ellipses have the 
equation 

axl 4- ihxiXi + bxi 4- c = 0 



NORMAL FREQUENCY SURFACE 


485 


As already noted, the middle term 2hxix% is present in the 
equation because of the fact that the ellipse is rotated and now 
described in terms of axes other than its own, although the origin 
remains the center of the ellipse. The middle term is thus 
present because of correlation, which causes the rotation of the 
ellipse. This middle term is generally called the “ product term ” 
because it is the product of the two variables. When there is 
no correlation, this middle term disappears.^ The narromng of 
the ellipse, as Avill be seen, results in the increase in the value of 


the constant term ^ 

JiTTG \(T 2 

Since the normal bivariate simface in which Xi and Zo are 
coi’related is thus elliptical in form but rotated and narrower than 
the elliptical surface representing uncori'elated bivariates, the 
distribution of probabilities or relative frequencies Avill be given 
by an expression of the form 


dP(xiX2) = dxi dxi (9) 

This is the general formula for a normal bivariate frequency 
distribution of correlated variables. The remainder of the 
argument, which appears in the Appendix to this chapter, shows 
how the parameters k, a, h, and b may be evaluated in terms of 
the moments of Zi and X 2 . "When the proper values of the 
parameters are inserted, the formula is as follows :■ 

dP (X1X2) = dxidx-i (10) 

2 ir<rio -2 vl — r" 

This probability expression describes a normal bivariate 
frequency distribution such as that graphed in Fig. 129. The 
rotated position is reflected in the fact that the exponent of e 
has a middle “product term.” The fact that the surface is 
narrower than it would be if there were no correlation is reflected 
in the character of the constant term, which is larger than the 
constant term of a normal bivaiaate frequency surface of uncor- 

1 See p 475. 

- Sec .Ippendi.x, pp. 492-496. 



180 SlUDY Oh filVAJtlAJJiS iHD MUL7 JVAHIATJi6 


leUted variables * In other A\ords, btcau^-c r cannot be greater 
than 1, 

1 ‘ 

2jrffiffs -s/l — r* 2irffiat 


The degree to which the constant term in the correlated surface 
IS larger depends upon the value of r If r = 0, the constant 
term becomes identical mth the constant term of the uncor- 
related surface If »* = 1, the constant term of the correlated 
surface becomes infinitely large, reflecting the fact that when 
j- = 1 the surface becomes so narrowed that it is a plane, all 
points being on the Ime of regression 

LINES OP REGRESSION 

In the discussion of Fig 127 it was pointed out that the line 
AA’ passes through the means of frequency distributions o, b, 
and c Similarly, in the discussion of Fig 128, it was said that 
the line BD' passes thiough the means of frequency distributions 
a, b, and c In the discussion of Fig 129 the line oa' was said to 
pass through the means of any frequency distnbution made by a 
vertical plane parallel ivith the aij-axis, and the line bh' was said 
to pass through the means of any fiequency distribution made 
by a vertical piano parallel with the ij-axis These two hues 
are thus the progressions of the means for the normal bivanate 
surface As will be sho'wn shortly , they are also the least-squares 
lines that might be fitted to the surface In both senses, there- 
fore, they are the lines of regression for the surface 

If there is no correlation, as illustrated by Figs 122, 123, and 
126, the two lines of regression correspond with the major and 
minor axes of the ellipse, that is, with the axes represented by 
the Xi and X 2 lines of Fig 122 or Fig 12G By hypothesis, in 
the uncorrelated bivanate surface the mean of any frequency 
distribution made by a vertical plane parallel to the ii axis will 
be on the ATj hae, and the mean of any frequency distnbutioo 
made by a vertical plane paralld to the li-axis will be on the A* 
Ime When the surface is rotated and narrowed, as a result of 
correlation, it is part of thehypothe&is that the normal symmetry’ 

‘ The narrowing is due to i stretcluiiK upward of a given volume As 
indicated, in the limiting situation (r =* 1), the surface becomes a vertical 
plane stretching to "in infinite height and having an infinitesimal thickness 



^’ORMAL FREQUENCY SURFACE 


i87 


of the surface remains and accordingly the means remain in a 
straight line, but a straight line at an acute angle rather than 
perpendicular to the original axis. 

Mathematical Representation of Lines of Regression. The 
bivariate normal correlation surface in terms of probabilities 
has been found to be described as follows: 


1 ! /'^i:_cr£l£24-£!!^ 

(lP{XlXi) = g2(l-r=)W cia. clxidXi (U) 

27r(rio’2 vl — r- 

A. line of regression, for example, the line of regression of X 2 on 
Xi, is a general description of the law of relationship by which 
for a given value of .Ti the most probable value of X 2 may be 
determined. Equation (11) describes the joint probability of any 
bivariate Xi,X 2 . The probability of an}"- value of X 2 occurring 
-with some specified value of Xr, say xi, will be as follows: 


(lP(£lX2) — 


27r<ri(72 \/l — r- 


e 2(1 -r*) 


rf£LV_2,li-£!+£!!l 


dxi (1x2 


( 12 ) 


If (xi/ffi)^ is factored from the exponent of e, the equation 
becomes 


dP(xiX>) = 


2ir<7icr2 's/I — r" 


(Jl)^ ^ 

2iri2(l— I-') g 2(l-r!) V 2 ' 121 azj dXi rfXo 

(13) 


, xl „ X1X2 . , , 1 , , i’-(ii)- 

The square of -5 — 2? is completed by adding — ^ ' 

0-2 O'! V* 

which must also be subtracted to keep the value of the whole 
expression unchanged. This subtracted part may be conven- 
iently put ivith the other (fi)' term so that the final result of 
these operations is as follows: 


. -(Ii)»-r2(x.)» __L_/££_ri‘V 

dP(XiX2) = ^-p===e g 2(l-r2)V<r2 dXi dXi 

27r<ri(r2 'vl — 

(U) 


Upon simplifying the exponents and splitting up the constant 
term and the dx\dx 2 , this expression becomes 



- 18 S STUDV li/ViitlAJbA tM> VULfll lU/AT&i, 


dP(£iZi) 


' 1 

(7i ■\/2'ir 


(ii)> 

2 *‘* dxi 


1 

<rs vr^' Va*- 




( 15 ) 

Since tie first /actor rs a eonstaat (A being giv^cn), Eq (I5j 
shows that the probability of an Zt for a given value of xi 13 pro- 
portional to the probability of a normally distributed \anatc 

whose mean is r — £1 and ivhosc standard deviation is <r2 vT"— T’ 

ffj 

(It wall be recalled that the general equation for the noimal 
1 ^ 

curve IS y=. e dx) Accordingly, the most probable value 

<r v 2 ir / 

of X2 for specified values of ii, that is, the line of legression of 
Z2 on xi, IS as follons 


Xt 




tfi 


Xi 


The standard deviation or scatter about this line is crj - r’ 
From Eq ( 15 ) it is seen tliat the locus of all points representing 
the means of Xi for a given zi is xt ^ which is the equa- 
tion of the line of regression of Xt on ari The line of tegrei>sion of 

xi on Xi IS given by interchanging Xi and xj in the above argu- 
ment As indicated above, these two lines are the same as 
those that might be fitted to Ih© distribution by the method of 
least squares From Eq ( 15 ) it is also shown that the standard 
deviation of Xj for a given Xi (in other words, the scatter at anj 
jioint of the line of regression) is independent of the selected 
value of Xi, for it is always equal to <rj ■\/l — r* 

NORMAL MULTIVARIATE FREQUENCY “SURFACE” 

When a bivariate distribution 13 described in geometneal 
terms, one of the dimensions can be used to measure the fre 
quencies This xs not possible for distnfautions involving more 
than two variables In the three-vanable case, for example, all 
tliiee dimensions must be used to indicate the variations m the 
variables themselves, and none is left to measure the frequencies 
Resort is had in raultivanate problems to the use of densities 
to measure frequencies Such a device could ha^ e been used in 
the monovanate- 01 bivanatc case, instead of haMng the fre- 



NOUMAL FREQUENCY SURFACE 


489 


quency of any interval represented by the height of a rectangle 
erected on the interval, it could be assumed that the cases were 
represented by points on a line, and the more points crowded into 
any given interval on the line, i.e., the greater the density of 
points in the interval, the gi-eater would be the frequenc 3 '- of 
that interval. Likewise, in the bivariate case, instead of repre- 
senting the frequency of cases in any given two-dimensional cell 
by the height of a rectangular pile of checkers set up on that cell, 
it would be possible to look upon the various cases as points in 
the two-dimensional plane; the frequency of points in any cell 
would then become the densit}’’ of points in that cell. 

This is the device used to measure frequencies in the multi- 
variate case. For a trivariate distribution, for example, the 
various cases are looked upon as points in three-dimensional 
space, and the density of these points in any given three-dimen- 
sional cell becomes the measure of the relative frequency of cases 
in that cell. A trivariate frequency "surface,” if it may be so 
called, is in reality a trivaiiate density function. The same idea 
may be canied over by analogy to distributions of four or more 
variables, although no graphical representation can actually be 
made of such distributions. 

The properties of a normal multivariate “surface” or density 
function are merely generalizatioirs of the properties of a normal 
bivariate surface. Whereas in the latter case, loci of equi- 
probability (i.e., loci of constant level on the frequency surface) 
were ellipses in the a;iX 2 -plane, in the multivariate case loci of 
equiprobability (i.e., loci of equal density in the W-dimensional 
space) are ellipsoids in the Xi, X 2 , . . . , Xif space. A picture 
of a three-dimensional ellipsoid is given in Fig. 132. This repre- 
sents a contour of equiprobability for a trivariate normal distri- 
bution in which there is no correlation. Similar ellipsoids, some 
larger, some smaUei’, would represent other contours of equi- 
probability, and the whole distribution could be represented by a 
nest of such ellipsoids. The elliptical contoui’s representing a 
high degree of probability are, of course, the contours close to 
the center, the center itself being the point of maximum prob- 
ability (maximum density). As one goes off from the center 
in a straight line in any direction whatsoever, the change in 
probability (density) is in accordance with the normal law. If 
the variables are measured in standard-deviation units, the 



490 STUDY Ot BIVARlATbS itfD MVLl IV iUl iTES 

ellipsoids become spheres and the distnbution becomes sjni 
metneal in all directions 

When there is correlation between the vanables the ellipsoids 
of equiprobability becomes tilted wth respect to the \anous a\es 
and flattened out If the vanables aie measured in standard 
deviation umts, the degree of tilting in any diiection is directly 
related to the amount of the correlation between the variables 
concerned The greater the multiple correlation between the 
vanables, the narrower or flatter the ellipsoids become In 



l-io 132 — £11 pao d of cquiprobab t ty for a tr \anate noin ulfrequL cj 
d atnbuOon 

the limit m which theie is perfect correlation between a!! the 
vanables, the whole distnbution reduces to a hne through 
the origin at an angle of approMmately 54 t degrees (cos ' 1 V3) 
With all the axes (assuming the variables are measured m a units) 
As in the simpler case, a plane or hyperplane of regression 
1 epresents the locus of the mean values of one variable for v anous 
combinations o/ ibe other vanabfes Foi a normal makn anatc 
distnbution, the deviations fiom any plane ot i egression are all 
normally distributed with a constant standaid devnition for anj 
one plane. 

All the properties of a normal bivanate distnbution thus cany 
over to a normal multivariate distnbution the only difference 
being that ellipses of equiprobability and lines of i egression now 


NORMAL FREQUENCY SURFACE 


491 


become ellipsoids and hj^perplanes of higher dimensions. Basi- 
cally, the character of the distribution is essentially the same. 

NONNORMAL BI VARIATES AND MULTI VARIATES 

If a bivariate or multivariate distribution does not approach 
the normal form, much of the conventional correlation analysis 
loses its significance. In some cases, by taking logarithms or 
reciprocals a nonnorraal distribution may be transformed into a 
normal distribution.^ In some instances, a multivariate dis- 
tribution may be normal with respect to its variations about the 
means of the rows and columns but the means of the rows or 
means of columns may trace out a curve of regression. In other 
instances, the regressions of the means may be linear, or planar, 
but the deviations around these lines, or planes, of regression 
may be either nonnormally distributed or normally distributed 
with varying standard deviations. 

If, in the case of two variables, the regressions are linear, the 
initial arguments presented for the use of the product-moment 
foi-mula for r are still valid even for nonnormal distributions.® 
Large values of X\ would still tend in general to be associated 
wdth large values of Z2 (or mth small values if the correlation is 
negative), and a formula based upon the product deviations 
would give a good measure of the association between the two 
variables. If the distribution of cases around the lines of regres- 
sion is skewed, however, or if the standard deviation varies from 
one part of the line to another, the scatter about the Hues’ of 
regression loses its significance as a measure of typical variability. 
Great care must be taken in these cases in using an average 
scatter to determine the degree of error in an estimate based on 
the line of regression. When the distributions are not normal, 
the rule that two-thirds of the cases tend to lie between plus and 
minus Ci.j no longer holds. 

Finally, if the bivariate distribution is not normal, even the 
product-moment formula may cease to be a statistic of special 
significance in characterizing the distribution. In the normal 
case, if the two means, the two standard deviations, and r are 
all laiown, the bivariate distribution Ls fully determined. In the 
nonnormal case, other measures similar to measures of skewness 

^ See Chap. XV, pp. 377-396. 

= See Chap. XIII, pp. 338-353. 



1<J2 SlUOl Ot litX K\U MtLtl\ \Ul\1ti, 


ami kurtoaw m the inimo\.»mto c*>>e miy bo of aiuil if imt 
greater importance in dc^nlutig tho buanato <li>«tnbution 
riicse considerations should ah%a\s be bonie in mind \shen r l> 
upcd to mca-«ure correlation betneen nonnonnallj' distnbuted 
bivariatcs 

SimiUr statements may also be made about nonnormal multi- 
\ anate distributions Hero the higher dimensionality multiplies 
the possibilities of skc\\nei>.s, kurtosis, and other departures from 
normalitj ‘ 

VPPCVDtX 

DERIVATION OF THE EQUATION FOR THE NORMAL BIVARIATE 
FREQUENCY SURFACE, CORRELATED VARIABLES 
T}jc nomal bj\Ana(o surface in wWh Ai and At arc cumlatn! ii (l]jp> 
tical m fonn but rotated and narrourr than tho elliptical ciirfiirc tepre- 
scnting unrorrclatcd bivnriatcs The dislnbution of the probubiUtiea <if 
nhtno frwiuencjcs is given bj an txpriNsioii of the fuIJovmg form 


</7, (Ifl) 

in which the constants i, a', A', and A' may be evaluated m terms nftlic 
moments of \| and A, 

FirPl it IS to he noud that 

ffP(x t,) ilZ, dli - 1 fi) 

di, dti - 0 (iij 

SSPtXtXt)Xi dxt dss - 0 (lu) 

!^P(x,Zt)x\ dx, dxt “ ('») 

ffP(x xt)r» dri dxt - Bj (') 

j/P(riX*)xiX»dx, </x* “ r-ffiBj (vi) 


Ix]uat]Oii (i) IS true since the sum of all pniltabibtiiis or rclaliv c frrqurnrii's 
13 netesssanlv one rquations (ii) and (m) are true bi-causo Xi and xi 
represent dcMalionsfrom the means uf A i and Ai Thus ffPfxixd^idxi Jri 
is e<iuiv aicnt to ^ Xi, w Uich equals jcro I ikew ise, 

j J l'(.x,xt)xt dxi dxt ari - 0 

■Vt 

Lqiutimi O' I IS anolhcc form of Xv> 'ihah la equal to the v anance < f 
\i Iq (v) 13 cquunlcnt to ■^*ti which is equal to tho variance of \j 
and Lq (vl) is equivalent to which is equal to r«ri<r‘, ware 

r = Z/xiJt/N^t9t 

* For a more complete consideration of the problem Of iioiinonnJilil>i 
see Smith and Duncan, Sampling Slotistiu, Chap 18 



NORM.IL FREQUENCY SURFACE 


493 


Second it is to be noted tliat, with reference to its rotated axes, aa' and 
66', the equation of the ellipse representing the intersection of the fre- 

_C 

quency surface by a horizontal plane at a distance e ~ from the base plane 
is as follows: 

, Axf + B:u = C 

where x[ and x, represent the coordinates of a point with reference to the 
axes aa' and 66', that is to say, x', measures the perpendicular distance of a 
point from aa' and x™ measures the perpendicular distance of a point from 
66'. If the areal dementi dx^ is also expressed in terms of the x[x'^ 
coordinates, it becomes dxi dxj = dx[ dx^.* The whole probability function 
thus becomes 

dP(.x[x[) = ax[ dx. (17) 


But this is the form of a normal frequency surface for uncorrelatod variables, 
so that, as seen above, pages 474-475, 



sinee there is no cross-product term, H = 0. 


‘ It will be recalled that dPixiXi) — F(xiXi) dxi dxz is represented geo- 
metrically by the volume under the surface FiXiX^) cut off by a hollow pipe, 
erected on a rectangle in the XiX^ plane, the sides of which are dxi and dxi 
(see Fig. 125, p. 475). To express the whole probability distribution in 
terms of the new x^x'^ coordinates, the area of the pipe’s base, dxi dxa must 
be transformed into these new coordinates as well as the height of the pipe, 
FfxiXj). 

* The transformation of coordinates is of the form 

x\ = X. sia <x -r Xy cos a 
xs = cos cc ~~ Xi sin a 


where a is the angle that aa' makes with the X2-axis. CJ, Fine and Thomp- 
son, Coordinate Geometry, p. 120. Since, in general, dxi dxi equals, within 
differentials of higher order. 


it follows that 

dxi dXi = 

since cos* a -f sin* a = 1, 
133-134. 


5x2 ca:i 
ox« 5x, 
5X2 5X) 
5Xi 5x, 


dx\ dxj 


dx[ dx'« — dx[ dxi 
Cf. Wilson, K. B., Advanced Calciilvs, pp. 


cos a sm «j 
• sm cc cos a 



491 STUDY Ot lilVARIAthS A\D MULTIVARIATES 


The distribution function, Eq (17), may therefore bo written as follows 

'*'07) ] iix\dxx (18) 

where v, and aro the atandard deviations of the new variables x( and x. 
It will bo noted that this transfonoation has not changed the probability 
of a given ZiXs combination but has merely expressed it in terms of a new 
set of coordinates Accordinglj, P(xiXt) P(.x[ih)> ^vhero x[ and are 
denved by h linear transformation from Xi and Zi * 

Rnally, it will be noted that ui any equation of the second degree the 
product of the coefficients of the squared terms minus the square of one- 
half the cocflicient of the cross product term is invariant (that is, its value 
remains unchanged) under simple translations and rotations > Vccordinglj, 
the following relationships liold 




or since // « 0 


IB m a'h' - h'’ 

But uiasmuch as A »• (!/»»)♦ E ■ (!/»»)* >t follows lliat 


IB ■» — - 0 6' - V’ 


From this it follows that 


y/a b'^ h 


Use Will now be made of liirsc relationships to derive the values of o', b , 
and h' 

As noted abov c, since 


JxidXt =* 1 

Ihtn 

f-' /V -j 


Vo6'- (!>)' 


If both sides of E(j (19) are differentiated with respect to o', it is found 
that 


* Sec fooinote *, p 493 
^ Cf Fivtand Tjtouf'OV op cit,p 151 



A^ORAfAL FREQUENCY SURFACE 


405 



yfig-\(.a'xv^Ar2h'xixiA-b'x^) dSi dXz 


1 2ith' 

2 la'b' - (h'y-]^ 

1 6 ' 
2 k[(i'b' - 


By caucelinK out ~ J and multiplying the equation hy k, the loft side is 
equal to o-J [see Eq. (iv), page 492], and it is found that 


(a) 



or b' = a\[aV - (5'’)] 


If similar procedure is followed after differentiating Eq. (19) wth respect 
to it is found that 


( 6 ) 


la'b' - (/^'’)] 


or a’ = ^lla'b' ~ (/^'’)] 


If both sides of Eq. (19) are differentiated with respect to h', it is found that 



XiXie-\(.‘‘'J^i-+2h'xiXi-hb'x:n dxy dxt 


1 2t(-2/0 

2 [«'6' - (A')=]3 


i-h') 

k[a'b' - (/»'*)] 


in which, if multiplied through by k, the left side equals —a-Krsrn [see Eq. 
(vi)], and hence the whole expression reduces to 


or 

(c) 


ft 


-h' 

Wb' - (fi")J 


h' — — <ri<ririj[a'6' — (ft'')] 


From Eqs. (a), (6), and (c), it follows that 


id) 



eiris 

a 


02 


Va'b' - {k')- = — ‘ Vr^- 

<T3 


Equations (a), (b), and (c) are three equations from which the values of 
a', b', and h' may be expressed in terms of oi, o-j, and r. The direct evalua- 
tion of a', b', and h' from these equations is not a simple matter, however, 
and it is easier to proceed as follows: From Eqs. (o), (b), and (c), it is po.s- 
sible to express b' and h' in terms of a', as noted above in Eq. (d). It will 
also be recalled that 



400 iylVD} Oh m\ IR/lIhA i\Z> U t/U J Hi/ ITJ 6' 


Ui i>ubstJtii{iD(f equnsieot \alues, Ef| (1C) maj accorcliugjj' be 
as follows 

</l-‘{xiXj) ^ ** ~ ‘ ' *1 (20) 

The double sum = 1, howe\er, so that, from Eq (20), n 

follows that 

/ / "' .I«, *. - i (21) 

If both Biclos of Fq (21) aro differentiated with respect to o', it is foun I 
that 

MulUplication of both sides b> —o' and expansion of terms then give the 
following 

/ / (I) ^ 

-II 

But the loft side is equal to 


which, according to h qs (i\) to (a»), is equal to 

1 (-rf) -r- (.i) - •Ul - r’) 


If this value of a' is substituted m Eq (20), it will give an equation m 
which all the parameters hate beim cvalutcd m terms of the momenta as 
follows' 


.} ^ 2air>Cp"''‘».». + «0 diid 



PART V 


Study of Dynamic Variability 

CHAPTER XIX 
INDEX NUMBERS 

One of the most widely used statistical methods is the proce- 
dure that gives rise to the summarizing or expression of data in 
the form of index numbers. It is an application to a practical 
problem of simple principles of j'eady comparison, pririciples of 
averaging to^obtain summary figures, and principles of stratified 
sximpling. Today, the method of index numbers is applied in 
five large fields, as follows: 

1. The measurement of the general price level, or the measure- 
ment of general exchange value. 

2. The measurement of groups of prices, such as wholesale 
prices, retail prices, or wages. 

3. The measurement of the general quantity of production or 
trade Avith indexes of physical production, trade, or employment. 

4. The measurement of the general volume of business or 
trade with indexes of the value of production or trade, or \\'ith 
so-called “barometers.” 

5. Miscellaneous, including a mde variety of uses, some of 
which are given below on pages 511-513. 

History of Index Numbers. General use of the device known 
as an “index number” to serve as a comprehensive method of 
summarization- is of recent origin. Like most of the modern 
technique of statistics it has been developed since 1900. But 
the fundamental idea is an old one. According to Warren and 
Pearson, as early as 1738 Dutot made price comparisons showing 
that a group of representative commodities cost twelve times as 
much in 1735 as they did in 1508.* In 1764, an Italian, G. R. 

1 W.VRUEX, George F., .'ind Fr.\xk Pe.^rsox, Prices (1933), pp. 18-20, 
containing other interesting examples of attempts to measure change.s in 
general price level prior to the middle of the nineteenth century. 

497 



SrUDY OF DYNAMIC VAniABILITY 


Carli, attempted an investigation into the effect of the discovery 
of America upon the pmchasmg power of money, he constructed 
a very simple index number of prices, using only three com 
modifies, grain, wine, and oil He combined the pnces of these 
three commodities in order to compare their average level in 
1750 with the level of the same commodities in 1500 • The 
gold movement from the New World to European countnes 
aroused speculation throughout the mercantile penod inth 
resipect to the relationship between pnces and the amount of 
money in circulation Iiockc and Hume laid the groundwork 
for the statement of what is now known as the quantity theory of 
money Speculations of the seventeenth and eighteenth cen 
tunes, however, with the exception of Carli's unusual attempt 
were without the assistance of any measurement of the general 
pnee level 

Concern about the problem w as brought to a new height dunng 
the Napoleomc Wars, when pnces were fluctuating widely, and 
again dunng and following the Greenback era in the United 
States the question of the relationship between the general pneo 
level and the money supply became associated ivith infiatioiiarv 
ibsue of paper money In the decade pieceding the Civil War 
the discovenes of gold m Califoima served to arouse inteicst in 
(he question of the effect of mciea^ed supplies of gold upon the 
geneial piicc level 

Twentieth*century economists, already interested in the 
quantity theory of money by reason of tlie accumulation of these 
histoncal expenences, weie proxoked to continued and diligent 
study by the development of the South African gold mines since 
the ISOO’s, accompanying world-wide general nsmg pnces until 
the First World War Dunng the First World War and the 
subsequent penod of maladjustment, ivith countnes all over the 
world alternately on and off the gold standaid, speculation in 
monetary theory became of such geneial inteiest that the prob- 
lem preoccupied some economists almost to the exclusion of other 
fields of study 

Meanwhile the statistical techmque of measunng general pnee 
change by the index-number device x\a3 developed, by 1798 

• Mitchell, Wesley C , “Index Numbers of Wholesale ibices in ihfi 
bmted States and Toreign Countnes ' Bureau of Labor Statistics BvneUt 
284 p 7 cf also reprint of Part I The Making and Using of Index Niu'i 
bors Bulfclin 656 (1938) » 7 



INDEX NUMBERS 


499 


Sir George Schuckburg-EveljTi formulated a plan for making 
index numbers of prices.^ The efforts of the early statisticians in 
this direction ^yere accorded but scant approval bj' the econ- 
omists, who were apparently suspicious of “political arith- 
metic.” Ricardo said that it is impossible to determine “the 
value of a currency” by its “relation not to one, but to the mass 
of commodities.”- Early in the nineteenth centurj' mathe- 
maticians were more interested in the application of the theory 
of probabilities in the fields of astronomy, biology, anthropologj', 
and geology. The great exponents of the developing technique 
in the application of statistical theory to the social sciences, such 
as Qu4telet, were busy with problems in the realm of ethics and 
morals; but about the middle of the nineteenth centuiy came 
powerful support for the application of these principles to eco- 
nomic statistics. 

William Stanley Jevons claimed that the works of Qu4telet 
abundantly proved that many subjects in the social sciences are 
so hopelessly intricate that they can be analyzed only by the 
use of averages and by trusting to probabilities as the fonn of 
generalization. He constructed indexes of wholesale prices in 
order to measure the value of gold and invoked the theorj’- of 
probabilities as justification of his claim that the rise in prices 
was connected vdth the change in the value of gold, sajdng that 
“the odds are 10,000 to 1 against a series of discoimected and 
casual circumstances having caused the lase of prices — one in the 
case of one commodity, another in the case of another — instead 
of some general cause acting over them all.” The general cause 
acting over them all was considered to be the change in the 
value of gold.® 

In 1887 Prof. F. Y. Edgeworth began a series of contributions 
to the problem of index numbers as a method of sununarizing 
trends in price statistics. He brought to bear upon the field of 
the social sciences the mathematical theoiy of probability. He 
saw clearly that it is a problem of applying a strictly a priori 

‘ “.'tn .-tccount of Some Endeavors to Ascertain a Standard of Weight 
and Pleasure,” Philosophical Transactions of the Royal Society of London, 
Part I, .Art. \dii, pp. 132-185; citation from Wesley 0. Mitchell, Business 
Cycles — The Problem and Its Setting (1928), p. 191. 

= Ibid., p. 193. 

^ Ibid., p. 195. 



oOO ^TUD\ 01 J»YlUiC J ihuitiun 

theory to an analogous situation, but he insisted tliat tJjt tlieori 
of piobabihtj applied* Later, the thcoictical apphiation of 
probabilities to the problem of measuring ‘‘Ocial phenomena, and 
particularij tlie gcueial pnee level, uas taken up bj C \l 
Walsh, who published in 1901 a treatise on the mcosuiemcut of 
the price level and later published a book entitled The Problem 
of Estimatioii, which further developed the application of prob- 
ability theorj to economics * 

Since aiiout 1915 the important problem of the techmciue of 
inde.\ number construetjon lias been attacked by a number of 
scholars Prof Weslej C MilchcU uas a pioneer in the evplora 
tion of the technical problems involved and a major part of tlicir 
solution, others have done impoitant noik of this character 
dunng lecent >ears, especially the economises and statisticians 
m goveramont or scmigovcmmcnt agencies, such os the Bureau 
of Labor Statistics and the Fcdeial Reserve Board 

Interpretation of the ptoblcms involved m the making of 
inde\ numbers may be facilitated bj an il>sib of two of the mam 
principles involved (1) the concepts of absolutes vs relatncs 
and (2) the application of the thcor> of stratihed sampling to tlio 
particular problem of the making of an mde\ number^ 

Conversion of Absolute Numbers to Relative Numbers 
Absolutes An absolute is an expression of the munberof things 
being considered, measured by an appropriate unit, as 1,000 
bushels of wheat or 50 acres of land A simple absolute taken 
by Itself IS of little importance The number of jieople in a 
country is of no particular significance unlcsa a companaon is 
desired, for example, a conijianson with the natural rcsourci's 
of the country or wath the population at some other point m timo 
or in some other countiy 

Paces are ordmanly conceived of as absolutes, that is, sajicg 
that the pnee of wheat today is one dollar a bushel icfers to the 
objcctiv c thing, namely, the concrete one dollar It is true that 
thus particular absolute has a ratio aspect when it is thought of 

* Persons, \V M , Statistics and Lcoiiomic Theory," Itmewof hcotwne 
SlaluUcs, Vol 7, (1925), pp 18a-186 Also cited m lUslej C Mjtchwk 
Business Cycles — The Problem and Its Setting, (1928), p 197 

* The Ifeasurement of General Exchange-Valve, pp oo2-574, cited w| 
Wesley Cf Mitchell 'Tnclex Numbers of Wholesale Prices in the 1 mini 
States and Forrigu Coiinlnes, Bureau of Labor fatati&ticfl p 9 



INDEX NUMHEltS 


501 


as a measure of the value of wheat. But when thought of merely 
a-s one of the goods in an exchange, the dollar can rationally be 
considered to be an absolute. Prices accordingly are referred 
to as “absolutes.” 

Relatives. In tabular form a ready visualization is often 
accomplished by converting absolutes to relatives of some selected 
base. For example Table 63 shows data on three important 
types of productive acthdty in the United States. 


T.iBLu 63. — ^Estim-ited V.v.lue of Selectod Types op Pkiv,\te Cox- 
STBCCTIOK Activity ix the United States 


Years 

Xew factory ! 

construction 

Farm construction 

New nonfarin resi- 
dential construction 

Milliom* 1 
of dollars 

Index' ■ 

Millions 
of dollars 

Indcx^ 

Millions 
of dollars 

Index-^ 

Annual average 
1926-1929 

640 

100 

468 

100 

4,066 

100 

1932 

78 

12 

125 

27 

641 

16 

1933 

128 

20 

175 

37 

314 

8 

1937 

391 

61 

360 

77 

1,530 

38 

1938 

192 

30 

345 

74 

1,515 

37 

1939 

200 

31 • 

340 

73 

1,860 

46 

1940 

337 

53 

360 

77 

2,077 

51 


Source: Survey of Current Business, Vol. 21 (February, 1941), p. 21. 
* Each index ia on the base, average 1926-1929 = 100. 


Considerable difficulty is encountered in obtaining a clear 
mental picture of the comparative changes in these three series 
by study of the absolutes themselves. Was the decline in new 
factory construction more severe in the 1932-1933 depression 
than the falfing off in new residential construction? Did farm 
construction suffer more severely than new residential nonfarm 
construction? Such questions, involving comparative judg- 
ments, can be answered much more quickly if each series is 
converted into relatives or simple indexes upon a common base 
period. This is illustrated in Table 63, in the columns presenting 
the indexes nith average 1926-1929 as the base. 

Simple index numbers, or relatives, of this sort involve the 
notiom of comparing mth_ unity. The mind more readily 
grasps e.xpressions in round numbers than in odd numbers; it 









502 Oh inyAMK ViRIAIilllTi 

further reduces meutal effoit if the roimd numbers arc in mul 
tiples of 10 From this fict ‘in‘«es the \)ractice of relating pncca 
01 other quantity figuies or absolutes of any kind to each other 
m such a way as to get a compansoa based upon 1, 10, 100 
1,000, etc If based upon 1, they are called " proportp ng’*^ if 
based upon 100 they arc called “percentages “ They are all 
lelatives, or mdexes "Most commonly m the Umted States and 
111 Great Britain and many other countries^ 100 is used, although 
a few, notably Australia, use 1,000 
Even where there is but one pnee senes, it is simpler to com 
prehend the sigmficance of chan^ if the absolute prices are 
oonierted to a relative form For example, tlie changes m 
price of coffee per pound as shown m Table 64, are easier to 
trace from period to penod when expressed :ti relatives Thus, 


Table C4 —Price op Coffee 
Inni qI avtrogea m Neto 1 ork market of No 7 lito coffee 
(la dollars per pound) 


It«m 

1 Sy obol 

1 192« 

1 1933 i 

1 IS34 

1 1S41 

Pnee lb 


0 182 

0 078 

0 oos 

0 oso 

llplativE! , 

(100/0 182);. 

100 

1 43 

oi 

' 44 


Source Dureeu o{ Itbor Sietuttc* lW«l<Mle Pruu (June end December uw n o( 
t ec Hed ycinl 


let 1926 be considered 100 and the prices in other years related 
to It The arithmetic involved is simple lu pnnciple and contains 
two steps (I) dividing the senes throughout by the base selected 
w Inch may more conveniently be done by multiplying tluougliout 
by the recipiocal of the base and (2) multiplying by 100 This 
method, illustrated in Table 64, makes the figure for the ba^o 
penod equal to 100, and the lest fluctuate as percentages of the 
base 

Another elementary idea is involved m the making of relativ e>, 
and that is the reduction of nonhomogene^us sets of figures to a 
homogeneous base for purposes of companson and to simplifj 
interpretation of relative change among nonhomogeneous things 
For example, the prices of coffee per pound at different tmie«, 
the pnees of canned peaches per dozen cans, and the pnees of 
wheat per bushel are all three presented in Table 65 for coin 
lianson mth each other 





INDEX NUMBERS 


503 


It is difficult to compare the price of coffee per pound with the 
price of wheat per bushel on the one hand and with the price of 
canned peaches per dozen cans on the other, as they fluctuate 
from time to time; but if ail are changed to relative numbers, 
by the method alread 3 ’' described, ndth 1926 as a base period, 
the comparison may easily be made. This is illustrated in 
Table 66. 


T.vble 65. — Prices ok Cokfeb, Canned Peaches, and Wheat* 


Item 

1926 

1933 

193-1 

, 1941 

Cofifee 

0.182 


0.098 

0.080 

Canned peaches 

1.993 

1.146 

1.403 

1.528 

Wheat 

1.496 

0.724 

0.932 

0 993 


Source: Bureau of Labor Statistics, Wholesale PHcea, 

1 Prices of canned peaches are annual averages, quoted in dollars per dozen cans; prices 
of wheat are of No. 2 hard, ICansas City, quoted in dollars per bushel. 


Relatives Using a Base Period in a Time Series. Price relatives, 
and the relatives sho\vn in Tables 65 and 66, illustrate relatives 
using a base period in a time series. Three fundamental pre- 
cautions must be observed in the use of such relatives. 


Table 66. — Price Relatives of Coffee, Canned Pe.vche.s, .vnd Wheat 

Average 1926 = 100 


Item 1 

1 

1926 1 

1 

1033 1 

1 

1931 

19il 

Coffee . • .. .! 

1 100 

43 i 

Od 

m 

Canned peaches. .... . ' 

100 

58 



Wheat 

100 

48 


o 


1. It is almost always advisable and- sometimes it is necessaiy 
to know the absolute figures as well as-the^relatives — else mis- 
interpretation or even misrepresentation is likelj’- to result. 
A classic example of a use of relatives that' produced misinterpre- 
tation and may perhaps have even been intended to be misrepre- 
sentation was the evidence presented in 1932 by some notable 
protectionist "statesmen” in the. United States Congress. 
Following are some of the statistics thev*^ issued for public 
consumption : 





















504 


hrUDY Of Di^A^^C VARIABILITY 


Tabll 07 — Soiib OP THi. Largs Increases iv Impokis dcrinq the First 
8 Months op 1932 
(In percentages) 

Percentage 


Commodity Increase m Imports 

Cod andothcrsalfcand pickled fish/rom Denmark 3 729 S 
Salmon fresh or froaen, from Japan 2,511 8 

Fish in airtight containers from Canada 4 669 9 

Cheese from Denmark 136 3 

rapping paper (other than kroft) from Sweden 615 9 

Pig iron from Sweden 181 0 

Pig iron from the United Kingdom 611 3 

Wool and other jams from the United Kingdom 221 2 

Long staple cotton from Eg^pt and Untish India, 
but transshipped from Ihe United Kingdom 
Egypt 1,283 1 

British India 1,128 1 

krom Canada fresh pork 237 9 

Dried peas from Kew Zealand 477 3 


* Iho purpose of these statistics was to piovo that a \cntabla 
Hood of foreign goods was threatening to inundate this countij, 
put out of business ali its domestic produceis, and lower the 
wages of domestic workeis But the statistics are not what they 
soeni to be Soma uiLtlin items wae^ ver ) sma ll m tho ag gre- 
gate m January, 1932 (the date thej began to increase accoiding 
to tho table), that they weio not even listed m the c\tensi\e 
classified list of impoils that is publislied monthly by the Depart- 
ment of Commerce If an exceedingly small item is increased by 
1,000 per ceut, it is still smalF Each time it increases 1,000 per 
cent, it IS only eleven times as large as before, 2,000 per cent 
means twenty-one times as laige In Januaiy, 1932, the amount 
of pig iron imported into the United States from Sweden and the 
Umted ICingdom combined was less than 460 tons, worth about 
$4,500, which, compared with Umted States domestic produc- 
tion, was a mere nothing The imports m January, 1932, of 
wrapping paper other than kraft from Sweden amounted to 
$2,025 The imports m the same month of Egyptian cotton, 
transshipped from the Umted Kingdom, amounted to $982 
The last item is paiticulariy enlightening, it will be noted that 
the specification “tiansshipped from the Umted Kingdom" is 
taiefully made Most cotton of this type comes to the United 
States directly fiom Egypt and is not essentially competitne 
with Amencan-giown cotton 



IXDEX XL’MHERS 


505 


2. The meanin^foupei'cen^age figure is ^en.ambiguousi and 
study of itsJbackgrgiiiKH .s necessarj’-' bef ore it can be p roi)erly 
understood, /tn illustration of the mi.sinterpretatiou of a per- 
centage figure can again be found in the arguments of American 
protectionists. When it is alleged that our tariffs are aheady 
too high, the protectionists like to reply that they are not too 
high. To prove their statement thej-- point to the fact that a 
large percentage of the imports are on the “free list,” that is, 
that a large proportion of impoids into the United States are 
charged no duty at all. This argument sounds plausible, but 
its non sequilnr qualitj- becomes evident when it is realized that 
the tariffs on dutiable goods are so high that they are virtually 
excluded from entering; it is thus the virtual exclusion of certain 
dutiable imports that causes a large proportion of imports to 
be goods on the free list. If the entire 100 per cent of imports 
were on the free list, it would mean, not that the tariff’ was not 
high, but that the tariff was so high that none of the dutiable 
goods could come in. 

3. In a series of coordinate relatives, it is necessary to know 
the b a5e~and to3pemfy-it-for tha' information- of others. For 
example, death-rate figures are quite meaningless unless the 
comparison is knonm. The death rate maj’' be expressed as so 
many deaths per 1,000 people or per 100 people, and the statisti- 
cian should indicate which comparison is made. Death rates 
for a given disease may be expressed as the number of deaths 
per 1,000 people afflicted by the disease rather than as the number 
of deaths per 1,000 people whether exposed to it or not; again, the 
nature of the compaiison should be specified. 

In simple index numbers like those given as examples in 
Tables 64 and 66, it is essential to know that the base is 1926 
or the average of 1926-1929, as the case may be. This should 
always be indicated somewhere in the title or subcaptions of the 
table or in a footnote. 

Premni'ption of Nornialily in the Selected Base. 4\Tien a series 
of coordinate relatives is constructed by relating a series of 
absolutes to some selected base, the base tends to be regarded 
as the normal^ieyel._ Indexes in the series greater than 100 are 
lookedTupon as above normal, or above par, and indexes le.ss 
than 100 are looked upon as below normal. Since this tendency 
exists, it is ahvays desirable to give study to the matter of 



500 


SlUDY 01 DiNAVlC VAJtlABJLUY 


selecting the base Is it in fact the one that la at normal level 
or IS one of the other absolutes of the senes at normal level? 

For example, zn the illustration given in 7'able 63, should the 
annual average of new factory construction, 1926-1029, be 
regaided as normal? Was there, on the average, a normal 
amount of new nonfarm residential construction in those years? 
It might reasonably be argued that takmg 3 jears as a base is 
better than taking only 1 year, because an average of 3 years 
might tend to offset extreme fluctuations and produce compan- 
'!ons that Mould tend to bo better than if only 1 year Mere used 
as a base Thus the average of the 3 years might be about 
normal for each of the three types of construction compared, 
Mhcieas if only 1 year Mere taken one or the other of the three 
types migiit have had an exceptionally high or low year 

On the other hand, it may be pointed out that the jears 
1926-1929 covered a range of years in \v1jic1i a great construction 
boom reached its peak Consequently, all tj pes of construction 
were above normal m all three of those jears, some wnters claim 
this was the peak of the greatest and longest construction boom 
m history Construction Mas at a high level such as it might 
not bo expected to reach agaiu for many years, at least if the 
length of construction booms is some seventeen jears from peak 
to peak, as some say it is It may therefore be argued that 
1937 Mould be a better base to take, oven if only 1 year is used 
In that year the general lex cl of ccononuc activity seemed to 
be neaier to a normal or equilibrium than any other jear m 
iccent history, and certainly nearer normal than the boom jear 
of 1929 But the jear 1937 would be a poor base year for strike 
statistics because of the great disturbances in the coal industry 
m that year 

Selection of the base has an important effect upon subsequent 
judgments as to the trends of the three senes If the average 
1926-1929 13 taken as the base, all three of the construction 
net&stiU bchff'normaljo Ike year 2940, as the indexes 
in Table 63 shoM , but if 1937 is taken as the base, the 1940 level 
of new factory construction would be 86, the 1940 level of farm 
construction would be 100, and tlie level of new nonfarm residen 
tial construction would be 136 If 1937 is considered normal, 
in the jears 1926-1929 new factory construction averaged 63 
per cent above normal, farm constnictioii averaged 30 per cent 



INDEX NUMBERS 


507 


above noimal, and new nonfarm residential construction was 
166 per cent above normal. 

The data shown in Table 64 and the indexes presented in 
Tables 65 and 68 may also be used to illustrate the effect of the 
base selected. In Table 66 and Fig. 133, the year 1926 is the 
base and the prices of coffee^ canned peaches, and wheat are 
each set at 100 in that year; subsequent years are indexed 
accordingly. From 1926 to 1933 the greatest decline occurred 
in the price of coffee, the next greatest occurred in the price of 
canned peaches, and the decline in the price of wheat was com- 



1925 1930 1935 1940 1945 

Fig. 133. — Indexes of prices of wheat, canned peaches, and coffee. 1926 = 100. 

paratively the least. Their relative recoveiy was in the same 
order, and all three were below normal in 1941. 

But if it is considered that they were at normal levels in 1941 
so that all are called 100 for that year, a quite different picture 
is obtained, as shown in Table 68 and Fig. 134. If these prices 


T.tBLE 68 . — Price Rbl.\tives of Coffee, C.ucxed Pe.^ches, axd Wheat 

1941 = 100 


Item 

1928 ! 

i 1 

i 1933 j 

19.34 

1941 

Coffee 

228 

98 

1 i 

122 ; 

100 

Canned peache-s 


75 i 

92 1 

100 

Wheat ' 

' 151 i 

' i 

73 1 

94 : 

100 


were normally related to each other in 1941, then in 1926 the 
price of coffee was more than twice normal, the price of wheqt 







o08 blUDY Of DiVAMIC ^ llll ililLm 

V. as cll abo\ e 50 per cent over normal, and the pnee of canned 
peaches uas 30 per cent over normal Moreover, Fig 134 and 
Table 68 seem to indicate that it was the price of canned peaches 
that was farthest below normal in 1933, the price of coiTeo 
onlj slightlj below normal 



i IQ 134 —'ll <1 xia o( pr> is I coffee eai i id piocl ei and nl oat 1140 ••100 

Tor most companions, a jear too leinote in the past la not i 
desirable base For a Jong lime, 1913, or an iverage of the 
yearn 1909-1914, was loolicd upon as the best base penod to ust 
because it was the last normal penod befove the hii'st l\oild 
Wai The farm bloc in Congicss continued as late as 1941 to 
insist that farm pnccs should be permitted to rise to the pir 
that tMsted befoie the Fust World Wii, liut in J941-1942, as 
1 vrm prices began to iiso at a moie lapid latc than otlici paces 
so that tliej pissed paiitv, the faim bloc bigiu to insist upon :v 



INDEX NUMBERS 


509 


uew definition of parity. 'The long survival of 1909-1914 as a 
base illustrates, not the general desirability of having a remote 
base period, but merelj’- the power of the farm bloc. Ordinarily, 
general economic change over a 28-year period'^is sufficiently 
great to make such a base undesirable. 

In the 1920’s, accordingly, most comparisons came to be made, 
not -with prewar 1913, but with the average of 1923-1925 or Avith 
the single year 1926; these years persisted as a base period much 
longer than might ordinarilj'^ be e.xpected because the extreme 
decline of the depression of the early 1930’s made it difficult to 
select a new base period. Finally, however, as the years of the 
Second World War passed, the period immediately preceding it 
came to be regarded as the best base for current comparisons. 
In the early 1940’s the average for the years 1935-1939, or one 
of those years, began to be adopted as the base period.' 

lielative Paris of a Whole. A single absolute quantity is often 
divided into several parts, and these several parts are expressed 
as percentages or proportions of the whole. These are properly 
called, not “index numbers,!’ but simply “relatives,” although 
they could be referred to as “constituent relatives.” The term 
index numbers, used with strict propriety, refers to a series of 
relatives that is a composit e of ajnore. or less large number-of 
series 6'f"relative_,numbers. The series of relatives may be com- 
bined to form a series .of index. numbers by any one of a number 
of methods of aggregating or averaging, as vdll be explained 
later in this chapter. Accordingly, in strict usage when an 
inde,x is an average of relatives, the term relative should be 
reserved for the separate ingredients and the term index should 

* In the Survey of Current Business, Vol, 22 (November, 1942), the inde.x; 
of prices received by farmers was still reported on the base of the average of 
1909-1914 prices, the index of wholesale prices was still based on the 1926 
average, and the index of retail prices was based on the average for 1923- 
1925; but the cost-of-living index was based on the average 1935-1939, 
and the indexes of the purchasing power of the dollar (wholesale, retail, 
and farm) were based upon the 1935—1939 average. The indexes of national 
income and industrial production were on the average 1935-1939 base. 
The index of some manufacturing data, such as orders, shipments, and 
inventories, were based upon the averages for the single year 1939. The 
Survey of Current Business, Vol. 22 (December, 1942) published the Bureau 
of Labor Statistics indexes of wage-earner employment and weekly wages in 
manufacturing industries, revised, with the average of the year 1939 as 
the base. 



510 


^TVDY OF DYNAitlC VARIABILITY 


be used for the composite Yet this distmctiou is often honored 
in the breach as i\ell as lu the observance, and the student must 
expect to find the term index used m place of relative 

An impoitant item to remember in the use of constituent 
relatives is that a relative increase or decieasc does not neces- 
nanly mean an absolute increase or decrease in the subgroup 
7 he absolute of the subgroup may, indeed, move in the opposite 
direction from that mdicated bj the relative figures Con 
stituent relatives are useful uhen it is required to see clearly 
the relat ive change s If absolute changes are desired, the raw 
data must be__exaiiuned Tabic 69 is an example of the use of 
constituent relatives It rex cals the necessity of attention to the 
absolute as uell as the relatiio figures 

rABcE 09 — Death Rxte's per 100,000 PoucTHotpEas from Selectlo 
Causes 

IPeeJlJi/ premxumr-jiaying tnduttrtal tusine**, Metropolitan Lije Insurance 
Company 


specified rautei ct dea'h 

Ajrnual iat« 
1 100 000' 


1 neeiitage di«tributio 
ef ipfoificd caiuM 

1 1910 

1941 j 

1942 

1940 j 

1941 

1 1943 

All 

531 6 

553 ©1 

501 0 

100 00 

100 oo] 

IlOO 00 

Dinbctta lucliitus 

31 1 

33 s| 

30 2 

' 5 65; 

6 10 

€ 02 

Appendicitis 

8 6, 

7 6| 

5 4 

' 1 62 

1 37| 

1 08 

Influenza and pncuiuonn 

7-4 5| 

79 5, 

47 2 

14 OJ 

14 35| 

9 41 

Tuberculosis (all forms) 

44 9 

44 0 

41 9 

8 45 

7 94i 

8 85 

S} pbilis 

12 4 

11 0 

10 0 

2 33 

1 9Si 

1 99 

Cancer (all forms) 

102 1 

103 8, 

102 2| 

19 21 

18 74 

20 37 

Diseases of the besri i 

(233 3,245 9 

238 7, 

43 89 

44 39 

47 19 

Motor lehjcle accjdcnU I 

17 2 

20 2 

21 3, 

3 24 

3 65 

4 25 

Suicides 1 

7 5 

8 1. 

6 7, 

1 41 

1 46 

j 1 34 


Source Xlettopolit&n L fe Inauiuicc Company Stafutteol Sulplin March 1912 p 11 
* Pohc) holders based u] on Srot 3 tnontha ol each tear 


fn order to idustratc the necessity of presenting the ahsofute 
figures as neli as relative figures uhen constituent relatives aie 
used, Table 70 is draxxm up with a few items taken from Table 69 
Study of the percentage distribution shown in Table 70 would 
appear to indicate that the death rate from suicides increased 
between the jears 1940 and 1942 Actually, it decreased 
Alerely its relative position became more important The per- 




INDEX NVilBEItS 


511 


centage distribution, in the absence of attention to the absolute 
figures, would also lead to a tendency to exaggerate the rise in 
the death rate from automobile accidents. These misleading 
results are due to the change in the size of the totals for the 
respective years considered — from 107.8 in 1940 to 115.4 in 
1941 and to 80.8 in 1942. 

T.\ble 70. — Death R.\te per 100,000 PoLicyHoiJ>ERS from Selected 

Causes 

\Veekhj pretniunv-paying hiduslrial business, Metropolitan Life Insurance 

Company 


Specified cau&cs of <!eath 

Anunal rate* 

. 

Percentage distribution 
of bpccified caiibes 

1940 



1940 

1941 

1942 

All ' 


115.4 




100.00 

Influenza and pneumonia 

74.5 

79.5 

47.2 

69.11 

68.89 

58.56 

Appendicitis 

8.6 

7.6 

5.4 

7.97 

6.59 

6.70 

Suicide-s 

7.5 

8.1 

6.7 

6.96 

1 ^ 

8.31 

Motor-vehicle accidents 

17.2 

20.2 

21.3 

15.96 


26.43 


* Per 100,000 policyholders, based upon first 3 months of each >car. 


Especially when a small number of rates are being considered, 
as in Table 70, it is necessary to study both the rates and the pei-- 
centage distribution. Actually, the study of rates is required to 
answer the question: Is the rate from suicides greater in 1942 
than in 1941? Study of the percentage distribution of specified 
causes is required to answer the questions: In 1942 were motor- 
vehicle accidents a more important cause of death than influenza 
and pneumonia combined? Did motor-vehicle accidents become 
relatively more important from 1940 to 1942 as compared mth 
the other specified causes? Important questions are answered 
by each of the sets of figures; what is necessary to avoid is the 
use of the wrong set of figures to answer a given question. 

Great Variety of Si tuple Index Numbers in Use. Hundreds of 
simple index numbers are in use, and the number has been 
increasing rapidly since the First World War. ' Indexes of the 
simple type illustrated in Tables 63, 65, and 68 exist for nearly 
every separate industrial activity, for thousands of priee.s, for 
retail sales, wholesale sales, inventories, consumption of certain 
types of goods, and for many other things related to economic 












510 S21/DJ 0/ 1»AIV/C Kin/12f///yi 

bo u&cd for the composite Yet this distinction is often honored 
in tlie breach as nell as in the observance, and the student must 
expect to find the term index used m place of relative 
jVn important item to remember m the use of constituent 
relatives is that a relative increase or decrease docs not neces- 
•janly mean an absolute increase or decrc'ise in the subgroup 
Lhe vbsolute of the subgroup may, indeed, move in the opposite 
direction from that indicated by the relative figures Con- 
stituent relatives arc useful when it is required to see clearly 
the relative^ changes If absolute changes are desired, the raw 
data must be examined Table 69 is an example of tiie use of 
constituent relativ cs It i ev cals the necessity of attention to the 
absolute as w ell as the relative figures 

Iable go — Death Ratfs per 100000 roucTiioLDriis from Bexectad 
Causes 

yTtmtum-paying tndutirial buttTUM, Melropoblan Life Int ranee 
Company 


f>p«c Gp 1 ra aca of dcsih ^ 

Annual rate 
100 000' 



Perce tage d ale 
of epee fied c< 

but 1 


1040 

i"“! 

>042 j 

1640 j 


1 1941 

\11 

53t 

6 

553 

9 

oOl 

C 

100 

00 

100 

00 

100 

00 

Dijhctcs mcllitus 

31 

1 

33 

8 

30 

2 

5 

85' 

6 

10 

0 

02 

Appendicitis 

8 

6 

7 

6! 

5 

4| 

1 

62 

1 

37 

1 

03 

Influenza and pneuinonii 

74 

$ 

79 

5: 

47 

2 

14 

01 

14 

35 

9 

41 

Tuberculosis {all forms) 

44 

d 

44 

0 

41 


8 

45 

7 

94 

8 

80 

S>’philis 

12 

4, 

11 

0 

10 


2 

33 

1 

98 

1 

99 

Caucer (all forms) i 

102 

1 

103 

8 

102 

2! 

19 

21 

18 

74 

20 

37 

Diseases o( the heart 

233 

3245 

9 

236 

7 

43 

89 

44 

39 

47 

19 

Motor vehicle accidents 

17 

2 

, 20 

2 

21 

3 

3 

24 

3 

60 

4 

2a 

Suicides 

7 

5 

; » 

1 

6 

7 

1 

41 

1 

4C 

1 ■ 

34 


Source Xletro;)Ol tan Life Iruurutce CoiDpaii> Stetiet col Irlin March 1912 p II 
* Tol cyholdera based upon first 3 monUas of each }eat 


In order to illustrate the necessity of presenting the absolute 
figures as well as relative figures when constituent relatives are 
used, Tabic 70 is drawn up with a few items taken from Table 69 
Study of the percentage distribution shown in Table 70 would 
appear to indicate that the death rate from suicides increased 
between the jears 1940 and 1912 Actually, it decreased 
Merely its relative position became more important The per- 




INDEX NUMBERS 


511 


centage distribution, in the absence of attention to the absolute 
figures, would also lead to a tendency to exaggerate the rise in 
the death rate from automobile accidents. These nusleading 
results are due to the change in the size of the totals for the 
respective years considered — from 107.8 in 1940 to 115.4 in 
1941 and to 80.8 in 1942. 


T.vbi.b 70. — ^Death Rate pee 100,000 Policyhoedees fhom Selected 

Causes 

Weekly premiunu-paying industrial business, Metropolitan Life Insurance 

Company 


Specified caubcs of death 

Annual rate* 

Percentage distribution 
of specified cauhes 

19X0 

1941 

X942 

1940 

1941 

1942 

All 


115.4 

80.8 



100.00 

Influenza and pneumonia 

74.5 

79.5 

47.2 

69.11 

68.89 

58.56 

Appendicitis 

8.6 

7.6 

5,4 

7.97 

6.59 

6.70 

Suicides 

7.5 

8.1 

6.7 

6.96 

mimm 

8.31 

Motor- vehicle accidents 

17.2 


15.96 


26.43 


1 Per 100,000 policyholders, based upon first 3 months of each year. 


Especially when a small number of rates are being considered, 
as in Table 70, it is necessary to study both the rates and the per- 
centage distribution. Actually, the .study of rates is required to 
answer the question: Is the rate from suicides gi'eater in 1942 
than in 1941? Study of the percentage distribution of specified 
causes is required to answer the questions; In 1942 were motor- 
vehicle accidents a more important cause of death than influenza 
and pneumonia combined? Did motor-vehicle accidents become 
relatively more important from 1940 to 1942 as compared with 
the other specified causes? Important questions are answered 
by each of the sets of figures; rvhat is necessaxy to avoid is the- 
use of the uTong set of figui-es to answer a given question. 

Great Variety of Simyle Index Numbers in Use. Hundreds of 
simple index numbers are in use, and the number has been 
increasing rapidly“.sihce the Eii’st World War. Indexes of the 
simple type illustrated in Tables 63, 65, and 68 exist for nearly 
every sepai’ate industrial activity, for thousands of .prices, for 
retail sales, wholesale sales, inventories, consumption of certain 
types of goods, and for many other things related to economic 













512 


SlUDi Oh UYNIMIC VXUliBlLirY 


and bocial activitj IndeK numbeia lueismiiib pioduction 
from month to month m a largo list of iDdiietnetj have been com 
piled and published b> the Board of Governors of the Federal 
Reserve System and otlier agencies Indeves of marketing of 
fish daily pioducts livestock wool and poultrj and eggs have 
been compiled by tlie Bureau of Foreign and Domestic Commerce 
and indexes of the marketing of cotton fruits giains and vege 
tables lumber and othei natuial products are compiled by the 
same bureau 'Ihis buieau has also compiled and published a 
large numbei of simple relative hgurts for new orders and unfilled 
oixiers m a number of manufacturing industnes including iron 
and steel paper lumber textiles and another senes of index 
numbers of commodity stocks of manufactured goods and of raw 
matenals such ns chemicals foodstuffs metals textile matenals 
and rubber pioducts Ihesc indexes aic published cuncntly m 
the Current Suney of Busmcs$ by the United States Department 
of Commerce In the League of Nations publications indexes of 
V orld stocks of foodstuffs and certain lavv maten vis are available 

The United States Department of Commerce has recently 
begun the compilation at^d publication of indexes of transpor 
tation for the United States These monthly indexes include a 
combined index of all types of trans|>oi tation commodity and 
passenger and also indexes by types of transportation such as 
an index of air transportation and a combined index of intercity 
motorbus and truck transportation 1 be indexes are piibhshed 
monthly with the base pciiod 1935-1939 = 100 and appear in 
the Suney of Current Bimness • This publication contains 
other illustrations of the many uses of index numbers 

The use of cither suboidinatc or coordinate lelatives to aid 
m the mterpietation of senes of data does not involve the appli 
cation of the theoiy of statistics or the pnnciples of sampling 
although the gatheiing of the law data ma> have involved the 
use of the lattei Ihe rules of comparabilit> must be considered 
when numbeis arc converted into relatives however as mdi 
cated m the discussion above When a whole senes of these 
simple index numbeis or relatives are combined into a com 
posite index number it is neceissaiy to make application of the 

‘The transportation indexes are ilcacrbtd Ji the Purvey of Care t 
Bus nesB \oI 22 (September 1{V12) pp 20-28 Vol 23 (Maj 1943) pp 
26 27 



LXDEX NUMBERS 


5L3 


theory of statistics. The principles of stratified sampling applj^ 
to the construction of these composite index number-s. 

Index Numbers. Application of Sampling Technique. Index 
numbers are combinatmns pf.a large number_o£ sin^e sexaes of 
relatives by some method of aggregating or averaging. In the 
field of prices, indexes of farm prices, of cost of living, of retail 
prices, of wholesale prices, of xvages, and of exchange rates are 
some of the index numbers obtainable. Also, indexes of indus- 
trial production, of trade activity, of retail trade, and of 
employment are found in various sources. All these indexes are 
combinations of numerous series of relatives. _ 

From consideration of the various purposes for which index 
numbers may be used, it should immediatelj’’ be apparent that a 
difficulty is involved. How, for example, is it possible to get 
together all the facts in the United States regarding all whole- 
sale prices from time to time, or all wages, or all retail prices, or all 
production or consumption activities? The answer, of course, 
is that it is not possible, or certainly not feasible, but that a 
sample of some kind must be used. When a composite is made 
up of several serieSj^p,w_ shaU they be weighted? Should they 
be considered oLnqual importancepancLif not how shall their 
j.'elative importance be determined in making up the composite? 
It is "upo n the b asis of the principles of sampling that such 
index numbers are justified. As Prof. Edgeworth once said, 
the task is to extricate from fallible observations a mean apt 
to represent the general trend of prices, wages, production, or 
whatever is being measured. ^ 

The demonstration by eighteenth- and nineteenth-century 
statisticians, such as Sfissmilch and Qu6telet, that a hitherto 
unsuspected regularity lay hidden in numerical data about 
social phenomena encouraged economists and social scientists 
in the belief that known variations that had been measured might 
be ' fair samples of the more numerous unknown variations. 
Furthermore, the construction of a great variety of composite 
index numbers by different investigators using different methods 
has produced z'esults of such consistency as to inspire confidence 
in their use.- 

1 Cf. Mitchell, Wesley C., Business Cycles — The Problem and Its Setting 
(1928), p. 204. 

2 Cf. Mitchell, Wesley C., “Index Numbers of Wholesale Prices in the 



ill blUlJi 01- 2>1 \ IMIC \ iUIiUlLlli 

To graiiJ the significance of lui indc\ number, it is not suiricicut 
to have reference only to the summarj picture it presents fust 
aa in the case of an a\erage of one freqiiencj distribution, so m 
the case of index numbers, the distribution of cases is of great 
importance Vn index number is rcally_a_senc£ of _a \ cragos 
based upon a senes of frequenej distnbutions — one frequenev 
distribution for each time penod — of which the index number 
itself IS an average of some sort A study based upon this idc i 
was made by Weslcj C Mitchell in his analysis of jear to-jcai 
fluctuations of the prices rccoidcd in the wholesale pnee bul- 
letins of the Bureau of Labor Statistics, covering pnees from 
1891 to 1918 and including 232 to 348 commodities lie found 
that the price changes from jear to jear formed a fairly sjm- 
motncal fitquency distnbution each year, and hcncc he con- 
cluded that ‘ when it can be shown that phenomena arc 
distnbuted approximately in this fashion, tlieir average can safcl) 
be accepted as a significant measure of the whole set of v anations, 
since even the deviations ftom the average arc tlien grouped m a 
tolerablj definite and symmetrical fashion about the average 

Such ah analysis seemed to establish as satisfactory the use 
of an average to summarize price change from year to jear, but 
index numbers frequently extend over a conniderablt period of 
time fto that the general level of wholesale pnees of 1042, for 
example, is compared with 1920 as a base or with 1935-1939 
Year-to year fluctuations may occur m a manner such tiiat the 
average may be used to summaiizc, but wiiat of change com- 
pared with some year raoie icmolc in the past? In ordci to 
test the leliability of the method of index-number construction m 
this regard. Prof Wesley C Mitchell applied the technique of 
taking “cveral samples, m one sample ho took 212 commodities 
m another 50 commodities, and m a tliird sample 25 commodities 
at wholesale pncea and constnicted three sample index numbers 
for the penod 1890-1913 He found that the results from tlu 
smaller samples were stntingfv close to those of the larger 
sample * 

United States and Foreif?n Countries Bureau of Labor Statistics, Dullelin 
284, p 11 

‘/6i<i.,pp 17-18 

• Ibtd , p 38 The theory of samplmg errors docs not apply in a waj 
that makes possible mathenvatical te8t»*from a einglo sample 



INDEX NUMBERS 


■ 515 


Stratified Sampling Method Applied. Others have also found 
that whenever the principles of stratified sampling have been 
followed in the construction of index numbers of wholesale 
prices, the results obtained are similar to the results obtained b}'^ 
the use of all available data. This inspired confidence in index 
numbers extending back through the years, for which fewer price 
series are available in published records, and at the same time 
increased the belief that such an average expression of prices in 
the form of index numbers is a valid summary picture of general 
price change. 

It is upon the basis of the principle of stratified sampling that 
it is possible to measure by index numbers, such things as the 
cost of Hving, or the volume of pi'oduction, or the general whole- 
sale price level. Also, it is upon the basis of the theory of 
sampling that credence can be given to index numbers; in 
addition, it is due to this very fact that it is necessary to examine 
the constituent parts of an inde.x number to be sure that it 
measures what it purports to measure and that it is applicable to 
any particular problem for which it is desired to use an index 
number. 

It is necessary t o notice that .stratified sampling is apphed to 
the making of index numbers. For example, take the problem 
of measuring general price movement. This is not a case in 
which there is an infinite number of items, although the universe 
is a very large number, and the number for which data are given 
is probably less than the number for which data are unavailable, 
particularly in the case of retail prices or wages. In the case of 
wholesale prices the available data cover a larger portion of the 
universe. 

Not only is there not an infinite number of items, but the 
number of available items is often not a ver}’’ large one. For 
example, some index numbers are based upon less than 50 
individual index-number series. However, the universe from 
which the items are taken is one concerning which a priori 
knowledge exists. According to such a priori knowledge, a 
representative sample can be obtained by a conscious or delib- 
erate proportional selection of items from the various known 
strata of the uniwfse. For example, it is known, in the case of 
wholesale'prices, that the universe is made up of prices of foods, 
prices of metals, prices of foz’est products, prices of various raw 



510 


iiTVDY OF DYNAMIC VARIABILITY 


matenals, pnces of semimanufactuied products m a number of 
fields that can be classified, and prices of final goods at whole- 
sale, to enumerate a few of the known strata of tius universe 
Knowing that such strata exist m the universe, the sample can 
be made proportional bv a deliberate stratified random sampling 
procedure that would ensure pioper representation in the "ample 
of all the various strata known to exist in the umverse ' 

Variety of Purposes of Index Numbers a historical propo- 
sition, the onginal all pervading purpose of an index number 
was to measure general exchange value, that is to saj, to explain 
the relationship between pnces, in their general or average move- 
ment, and the value of money and credit 

At the present time, however, a large numbei of general 
indexes of pnces and other phenomena are curiently published, 
but few even of the general pnee indexes puiport to be a measure 
of the value of money General indexes of retail pnces, indexes 
of wages and pay-roll totals, indexes of pnces of farm pnces, 
metal products, manufactured goods, and raw matenals, as 
well as general wholesale pnces, are now available \Vluch of 
these pnee indexes reallj measures the value of money? 

Some statisticians and economists have held that a real 
measure of the changes in the value of money and c redit should 
include^ not only wholesale pnces, but also wages, rmit, and 
other pnces, including retail pnces and perhaps th£.pnces_of 
secunties Samples of each kind of pneo should be included m 
the index of pnces that aims to measure genet al exchange value 
On this theory, Carl Snyder, at that time statistician for the 

* C/ Kixc, W I , Index Numbers Elucidated, cspeciixlli pp 64-t)b lius, 
of course, often turns out to be a counsel of perfection m prictice The 
pnnciple is based upon the assumption that in each of the strata designated 
the available data can be sampled successfully at random and in practice 
this IS often not true For illustration, in gathering prices for an index of 
wholesale prices such subgroups of pnns, or strata as sulplninc acid and 
Porilsnd cement are standardised, while house lurnishiiigs arc not From 
the pouit of view of obtaining the best possible results with the nimimum 
amount of price gathering, and presumably with limited funds for the 
purpose, it would be sound practice to abandon the counsel of perfection 
and spend less money gathering pnces of atandaidized articles and more 
gathenng prices of nonstandardiaed articles The resulting disproportionate 
amount of prices in the respective sub(,roups can then be countered bj the 
required adjustments in the aveights used to combine the scries of nlatic es 
into index numbers 



INDEX NUMBERS 


517 


Federal Reserve Bank of New York, compiled an “index of 
general price level,” including wholesale and retail prices, wages, 
rents, etc., but for certain reasons he excluded security prices. 
After careful study, these various components were given certain 
weights in the general composite. It should be pointed out that 
this index of general price level was originated for the special 
purpose of deflating jiata-nm bank clearings. Since bank clear- 
ings included payments for all these things, Snyder believed that 
an index of prices based upon these components could be used to 
cancel out that part of change in total bank clearings due to 
price change and obtain thereby an index of physical volume of 
trade. Even if it is granted that this index of general price level 
is valid as a deflator of bank clearings, it stiU remains a question 
whether or not it measures the e.xchange value of money. 

It could be argued -with considerable force that such a general 
measure is impracticable because of the difficulty of getting 
adequate samples of rents, for example. And in any case, such 
a general measure of prices does not really give the measure of 
change in the purchasing power of' money. The general pur- 
chasing power of money may ■be''a far more flexible and possibly 
sensitive factor than this general price average would indicate. 
A general price average would include an overweight of prices 
largely controlled by custom, or of prices in which resistance to 
change is very great for some other reason, as, for example, 
because of public regulation, taxation, or their indirect effects. 
The tioie measure of change in general exchange value may 
be more nearly approximated by the wholesale price index and 
perhaps even by the group of more sensitive wholesale prices. 

It is not the purpose here to carry this argument to a conclusion 
but merely to suggest its unsettled state. It may be significant 
that the Bureau of Labor Statistics has published reciprocals of 
its several indexes of prices — ^Avholesale, retail, cost-of-living, qnd 
farm products — as indexes of the purchasing power of the dollar 
in those respective fields. The question of how to measure 
general exchange value, or the purchasing power of money, con- 
tinues to be a controversial one. Meanwhile, index numbers 
continue to serve enormously useful special purposes whether 
or not collectively or individually they measure general exchange 
value. In his Treatise on Money, J. M. Keynes appears to 
suggest that the exchange should be looked upon as a number of 



518 


^iTUDY OF Dl \ miC VAItl \D1U1 1' 


relatuely noncompeting groups of markets and that there may 
be no such thmg as a general purchasing po^^cr of monej.* 

In the light of such theoretical difficulties hindering the 
proper measurement of the purchasing pouer of money by using 
reciprocals of pneo indexes, recent attempts ha\ e been made to 
construct indexes of the purchasing power of money by other 
means One notable contribution is the index of purchasing 
power constnicted by Murray Shields, this combines monetary 
data, VIZ , demand deposits, foreign deposits m the United 
States, foreign bank deposits in Federal reserx e banks, \ olume of 
mone> in circulation, and cash m the vaults of commercial 
banks * 

Construction of Index Numbers. Principal Methodz Frof 
Irving Fisher of Yale Um\cn>ity, m a comprehensue study 
of the mathematics of lodcvnumbci making, found several 
hundred kinds of fonnulas for calculating index numbers, but 
it is quite unnecessary to be disturbed b> this fact, smeo os 
ho himself says, only a few of them arc of any \ alue There are 
two principal methods of calculating index numbers now m uso 
and generally recognized as adequate for most purposes, but 
other methods arc occasionally used and will therefore bo 
described The most commonly used are (1) the weighted 
a\erage-of-relatives method and (2) the weighted aggiegativc 
method Other methods sometimes used are (3) the simple 
average-of-rclativcs method and (4) the simple aggregative 
method Vanous alternative wajs of appljing these methods 
are possible For example, in the case of the simple average of 
relatives, sometimes the median is used instead of the anth- 
metical mean m order to avoid extreme vanations, it is advisable 
to uso the median especially for ierj small sanies These 
methods will be taken up m the order (3), (1), (4), and (2), 
wluch is the logical method of treatmg them, rather than m the 
order of their prevalence in use, vxhieb. is tliat ©.vca above 

Simple Aierage of-relalues Method Referring again to the 
simple case of the pnees of coffee, wheat, and canned peaches, 
already used, perhaps the first method that would suggest itself 

* C/ also BbCKHAHT, B U , The New York Money Varkel, \oI 2, and 
KrxG, op at, pp 1S9-216 

* “A Measure of Purchasing Power Inflation and Deflation,” Jourruil of 
the irijencan Statietienl Asroaalion Vol 35 (1940), pp 461—170 



INDEX NUMBERS 


519 


to anyone desiring to obtain a sunimaiy figure representing 
average price change would be to add up the relatives and divide 
by their number, as follows; 


- Table 71. — Composite Index Number of the Prices of Coffee, Whe \t, 

AND Canned Peaches 
1926 = 100 


Commodity 

1926 

1 

1933 

1941 

Coffee 

— =-100 

Po 

/ 

II 

5:1 

Pa 

- = 44 

Pa 

Canned peaches 

^ = 100 

Po 

It 

^ = 58 

Pa 

It 

Vz 

^ = 77 

Po 

It 

WTieat 

% = 100 

Pa 

^ = 48 

Pa 

p-> 

% = 66 

Pa 

Average 

3)300 

1 100 

3)149 

50 

3)187 

62 


The resulting composite index number shows that on the 
average these three prices fell to 50 in 1933 in comparison with 
100 in 1926 and then rose on the average to 62 in 1941 in com- 
parison with 100 in 1926. Reducing this method to sjunbols. 
Let po, pi, Pi represent the prices of coffee. 

Po; v'\i Pi represent the prices of canned peaches. 

Poi v'\, Pi represent the prices of wheat. 

The relatives that appear in Table 71 are thus shown also in 
symbols. For example, the ratio Pi/p” corresponds to 66 — in 
these symbolical presentations the multiple 100 is always “under- 
stood” and not actually written in the formula. The averages 
for the three wmuld be expressed by symbols in Table 72; 


Table 72 


1926 

1933 

1941 

Pa ,Pa , Pa i 
"r / "T // 
Po Pa Pa 1 

po Po Po 1 

t n 

P-- , Pi , Pi 

i 7 T Tr 

Pa Pa Pa 

3 

3 1 

3 


These averages are represented by the letter P, and when N 
commodity prices are averaged, instead of only three, for n years. 











CUD\ Oh m \ lW/(. \ UtlAUILnY 


matuid of onlj 3, tlic M.nu> of a\cragus of rcUti\{.-b i» rcpro* 
•'CMted ■'j ijiboljcilh as follows 




I iio capital iV rcfcrii to the number of priLts, and the ssiiial! &«b- 
hcnpt n refers to the number of > car-, or number of time penodi, 
which might bo months oi weeks a» well as >cars In general, 
the <jiib&enpt3 to the senes of represent the time penods, and 
0 IS as'igncd to the base time jienod, at which the relatnc equals 
100 The a\cragc of the iclatucs likewise equals 100 m the 
base time ponod The pnmes refer to different commodities 
IFcij/Atcti o/»rcfa/tie« Method Ihc simple a\crago of 

rclatuca mvohcs tlio assumption that changes in the so%craI 
prices to be combined arc of equal importance, but this ma> nut 
bo true Cou'cquently, tho idea of weighting the component 
pneo rcUtnes in accordance with weights that are considered to 
reflect their rclatisc importance has been de%eloped 
The weights are commonl> Jiascd upon some rational con- 
siderition such os the quantities consumeiTTn a given represen- 
tative 5 car, the quantities produced, fanul^ budget figures, or 
some other criterion Suppo&c, after conadenng all available 
information on the subject, changes m the price of a pound of 
eofTte are coiiMdered thnee os important as changes in the pnee 
of a dozen cans of peaches and changes in the price of wheat per 
bushel ire judged twice as important as changes m tho pneo of a 
pound of coffee Convenience of calculation will be attained 
if the numbers used as wciglits arc so arranged that they will 
sum up to 1, 10, or 100, because the averaging process wail then 
be d simple matter of changing decimal points m the sum of tlie 
weighted relatives Such a inampulation of tiic quintities 
repreocntmg weights will have no effect on the final answer and 
will reduce the amount of work considerably it the problem is a 
long one mvolvnng, say, several jears of monthly indevcs In 
the illustration used above, the weights are as follows 


Coffee, 3 = w 
Canned peaches, I = «/ 
A\Ticat 6 s® u" 



INDEX NUMBERS 


521 


A weighted average-of-relatives index number of thebe three 
commodities would be calculated as illustrated in Table 73. 


Table 73.— Index Notiber of the -Prices of Coffee, Wheat, v.\d 

C.\NNED Peaches 

Weighted average of relatives, 1926 = 100 


Conimoditj' 

1926 ^ 

1 

1933 

1941 

Coffee 

100 X 3 = 300 
100 X 1 = 100 
100 X 6 = 600 
10)1000 

43 X 3 = 129 
58 X 1 = 58 
48 X 6 X 288 
10)475 

44 X 3 = 132 
77 X 1 = 77 
66 X 6 X 396 
j 10)605 

Canned peaches 

AVhfiiit 

Weighted average 


100 

47 5 

1 60.5 


In symbolic language, the weighted average of relatives illu.s- 
trated in Table 73 is as follows: 


Poo — 



2w 


Poi 



2w 


P 02 



( 2 ) 


Instead bf weighting by arbitrary weights, the actual quan- 
ti ties o f the article s consumed or produced in the base year are 
sometimes used as weights, if such data are available. The 
quantities of the base year or base period are retained through- 
out 7 "instead of gettin^the new quantities each year or each time 
period, for two reasons; (1) because it is difficult if not impossible 
to get quantity figures for every year and (2) because the pro- 
portions between these quantities are not likely to change 
greatly over short periods of time. If, after a given base period 
has been used for some time, it is discovered that one or several 
of the quantity weights are at variance vdth current conditions 
that seem to be likely to persist, the system of weights maj”^ be 
revised. In the various index numbers it constructs, the Bureau 
of Labor Statistics keeps continually on the ^yatch for such 
changing conditions and when desirable changes the weighting 
sj'stem. 

The symbols for quantity weights are series of q’s, as follows: 

qo = quantity of coffee in 1926 
qi = quantity of coffee in 1933 
52 = quantity of coffee in 19-11 


622 


STUDY OF DYNAMIC V UtlAUlLnY 


— quantity of canned peaches in 1926 
g'l = quantity of canned peaches in 1933 
?! = quantity of canned peaches in 1941 

Wheat wuuld be the same arrangement of a senes of g"’s The 
resulting index number, using base year quantities as weights for 
the relatives averaged, would be as follows 


Pco — 





(3) 


Simple Aggregative Method As suggested by its name, a 
simple aggregative index is_JLh(i sura of the absolutcjinccs, 
Mithout first changing them to relatives Ihus the raw 
pncea of coffee, canned peaches, and wheat for 1926, then for 
1933, and then for 1941 ivould be added together to give 
the index This seems to be combining nonhomogcncous 
things, and it is, nevertheless, there is one famous and at one 
time widely used index that was based upon this method 
Such was Bradstreet's index of wholesale prices, which continued 
in use for many years Following is an illustration of the method 
of Bradstreet’s index of wholesale prices ‘ 


Prices Dollars per Pound 
0 0007 Cormdlsville coLc southcra coke 
0 OOl Bilunimous coal brick, iron ore 

0 002 Anthraute coal 

0 003 Salt 

0 004 Bessemer pig iron 


0 31 Vlcobol 

0 so Australian vrool 

0 o2 Quicksilver 

0 84 Itubber 

9 8530 The sum, which 13 the index 

Acrnjdjcg to this method, the jodev number of pjacos does 
not assume the form of a relative, but appears as follous 
1 A good description of Bradstreet s index la contained m Vi C Mitchell 
Index Numbers of Wholesale Pncea in the United States and Foreign 
Countries, ’ Bureau of Labor Statistics, BuUHm 284, pp 161-165 Other 
price indexes are also discussed m that souree, such as Dun s, Gibson s 
the Annalist, \\ar Industries Board, Fcdetal Reserve Board and the 
Bureau of Labor Statistics 



INDEX NUMBERS 


523 


Table 74. — Bhadsteeet’s Ikdex 


193.3 

Jndex ! 

1934 

Index 

October ' 

.$9.0512 

8.8480 

January 

•38.8329 

9.0110 

9.2627 

November 

December 

8.8126 

Alarch ' 1 


i 



The index can readily be converted into a series of relatives 
upon any chosen base; the Survey of Current Business published 
Bradstreet’s index converted into relatives, \vith the monthly 
average of 1926 = 100 until November, 1937, when compilation 
of the index was discontinued. ^ 

Little rational justification can be mustered to the defense of 
such an index as Bradstreet’s, except that it worked well. Using 
approximately 96 commodities, it gave an index number that 
reflected accurately the changes in wholesale prices, as tested 
by more elaborately conceived and compiled indexes of 
wholesale prices later introduced into the field. Bradstreet’s 
index was the pioneer in the history of price indexes in the 
United States, having been started in 1897. The conversion of 
aU prices into prices per pound gives the effect of a concealed 
weighting, but no logical basis can be found for such a system of 
weighting. The symbolic expression of this index is as follows; 

Spo,-2pi, Sp2, . . . , 2p„ (4) 


When reduced to relatives and some base is taken as 100; it is 
as follows: 



2po 

2po’ 



p _ 


(5) 


While the concealed weighting system of Bradstreet’s index is 
accidental, or haphazard, depending upon the units in which 
goods are quoted, it has the effect of making the high-piiced 
articles dominant. Its success as a good index of price change 
was due to the fact that there was a skillful or at least a pro- 

^ Of. Current Survey of B^isiness, Supplement, Vol. 18 (1938) p. 168. 
Monthly figures for the index are available from 1903 and annual figures 
from 1890. See 1932 Supplement, pp. 28-29 and 1936 Supplement, p. 15. 
'Also see Bureau of Labor Statistics, Bulletin 173 (July, 1915). 



5J4 i,7bDi Ob Di \ IMIC ViimuIUTi' 

pitious use of stratified sampling in the sclectioa of the pncea 
used 

Weighted Aggregatiie Method In making index, numbers bj 
tlie aggregative method, it is usually considered that weights 
are required, for the same reason that the> aie regarded as 
necessary in constructing an index number bj the avcrage-of« 
lelatives method The most reasonable kind of weight would 
seem to be the quantities of the sevcial commodities produced or 
consumed or marketed Such figures have become mcreasmglj 
avadablc since the time when such indexes as Bradstreet’s were 
originally conceived and developed 

The last four or five dccenmal censuses of the Umted States 
have included more and more complete data on phj sical quan- 
tities of production and, more recently, data on retail and 
wholesale trade, and, in the years since the First World War, 
> early figures have been available on physical quantities of goods 
m stock and physical production of some goods, through the 
activities of the United Stites Departments of Commerce and 
Agnculturc If it is assumed that the method of weighting is 
one that uses actual quantity figures, there are two methods of 
weighting the price aggregates in order to construct the index 
number The first method is called “weighting by base-year 
quantities ” The second method is called “weighting by given- 
year quantities ” 

The desirability of weighting by base-year quantities has a 
twofold explanation (1) In spite of the increased availability 
of quantity figuies, there are still many commodities for which 
quantity figures are not easily available for every year, but a 
large number of such quantity figures, classified so as to be 
useful for weighting purposes, are available for the census years 
(2) With few exceptions, the proportional changes m the quan- 
tities oi value weights from year to 3 ear are not sufficiently great 
to cause large errors if these proportions are assumed to remain 
CKfiistiSst for seiersl yesrs la sucoesseois Adiustobsats la ibo 
quantity or value weights can be made in the case of rapidly 
growing or rapidly dLclimng industries, but the necessity for such 
changes within 10-year penods will not include a very large 
number of commodities As a purely practical matter, the 
choice of base-year weighting instead of given-year weighting 
gives adequate results with much less statistical calculation, is 



INDEX NUMBERS 


525 


well as, much less statistical research in seeking data for use 
as weights. 

United States Bureau of Labor Statistics. Construction of 
Index Illustrated hij Practice of an Official Bureau. In the 
United States, the Bureau of Labor Statistics is one of the 
most important official compilers of index numbers of various 
kinds. ■ From its publications can be illustrated how the various 
matters discussed above are brought into practice and how 
dihgent must be the researcher, how alert the statistician, to 
new problems of weighting, sampling, and the like. 

In 1943 the Bureau of Labor Statistics of the United States 
Department of Labor was comiriling and publishing weeklj% 
monthly, and annual index numbers of wholesale commodity 
prices. In a revision made in 1927, when the base period was 
changed from the 1913 average to the 1926 average, a new weight- 
ing system was adopted; it was then decided to remse the quan- 
tities used as weighting factors eveiy 2 years, as the results of 
each new biennial ceasus of manufactures became available. 
At the same time, the number of price series was increased from 
404 to 550. Another revision was made in 1931, when the num- 
ber of price series was changed from 550 to 784 and some rear- 
L’angement of the items in the gi-oups and subgroups was made. 
No change was made in 1931 in the method of calculating the 
indexes. In December, 1942, according to the Survey of Current 
Bxisiness, the monthly index of wholesale prices compiled b}' the 
Bureau of Labor Statistics was made up of 889 quotations. ‘ 

The weights used for farm products are based on averages 
For 3-year ireriods, changed every 2 years in order to keep the 
weights up to date. Thus, for the years 1932 and 1933, the 
weights used for farm prices Avere based upon averages of quan- 
dties marketed in the years 1927, 1928, and 1929; and for the 
^ears 1934 and 1935 the Aveights used for farm-products prices 
.rere based upon averages of quantities marketed in the years 
L929, 1930, and 1931. For all other groups of commodity prices, 
ffie AA'eights used are aA^erages of quantities produced for sale, to 

I Survey of Current Business, Vol. 22 (December, 1942), p. S-3. On the 
listory of its compilation, weighting, etc., see Bureau of Labor Statistics, 
'hiltelin 181, 415, 453, 521; “Wholesale Prices,” Serial No. 111434 (Deeem- 
ler, 1941); “Revised Method of Calculation of the Mliolesale Price Inde.x 
if the United States Bureau of Labor Statistic'',” Serial iSo. 11.666. 



52G t>TUDY OF mWAMIC VAHIAlilLirY 

\\hich been added the a\cragc of impotts foi conbumption, m 
tlic last two completed census penodb For example, for the 
>ears 1932 and 1033, the %vughts were based on a\eiage censun 
data (plus imports for consumption) for the > ears 1927 and 1929, 
whereas for the iears 1934 and 1935 the weights weie based on 
average census data (plus imports for consumption) for the years 
1929 and 1931 In eases where census data are lacking, esti- 
mates are made of the quantities of the various commodities 
marketed, based on the best information available from govern- 
mental and reliable pnv ate sources, and these estimates are used 
as weighting factors Commodities are added or dropped from 
time to time as tliej become important or cease to be important 
m the markets * 

Dunng the penod of depression following 1932, when the 
data on manufactured output became violcntlj disrupted, the 
weights based upon averages of the ^cars 1921K1931 were 
ictained 'Most of the paces continued in 1943 to be weighted 
b> the averages of the 1929-1931 census data, but for certain 
commoditv groups new weights had begun to be based upon 
special studies of tho’^c groups Thus, in Apn), 1041, the Bureau 
published a study of the “AVholesalc Pace Trends of Carpets and 
Rugs" revising its pace senes for tins group This study 
included new weights for the paces in this group, according to 
their “imiwrtance m the country's markets in 1939 ” 

The (piantity weight U'-td for each of tlio seneb, the unit in 
which each is pnetd, and the 1939 value of each item expressed 
as a percent ige of the aggregate value of all carpet and rug items 
in the Bureau’s indexes arc shown in Table 75 

The uso of data for 1939 di parted from the “general prictice 
of using the 1929 and 1931 data for weighting m the Bureau's 

‘If a ‘pnee” index, as contra&tcd with a "rcalurd price” index, is 
desired, it is necessary to keep ihc weights constant In constructing a 
realized pnee index Uic weights n)v> l>c ctiangcd, but m revising the weights 
the index must be calculated by using both sets of weights for the over- 
lapping jear or period when the change is made B> ‘realized price” is 
meant the dollars cov enng the tiunsattion, div idcd by the units inv olv cd in 
the transaction Wlicn the lack of continuity of specifications makes it 
hard to define the commoditv, ns w ilh automobiles, the dollars of sales for 
each general i>pc (sedans, coupes, etc) divided bv the number of such 
units. In other words, the “reahztil price, ’ has recnvcil the endorsement 
of competent price experts as an acciptablo quotation to use in price 
statistics 



INDEX NUMBEIiS 
Table 75 


527 


Price saritm 

Unit in Hhicli 
priced 

Weight 

Axminster -f carpets 

Lineal yard 
Each 

Lineal 5 ^ard 
j Square yard 
Each 

7,077 

2,015 

6,424 

7,861 

612 

Axminster 9 X 12 rugs 

Plain velvet 1 carpets 

Plain velvet ^ carpets 

Wilton 9 X 12 rugs 



Source: Bureau of Labor Statistics, mimeographed publication, “Wholesale Price Ttemis 
of Carpets and Rugs,” April, 1941, pp. 10-17. 

wholesale price indexes,” in order to provide weights for the 
individual items that reflected their relative importance more 
nearly in accordance ^vith present-day sales. The Axminster 
tj-pe has long been the most popular. The relatii'^e importance 
of Wilton carpets and rugs has increased considerably since the 
depression of the early 1930’s, and they have regained much of 
their earlier popularity. Prior to the depression and before 
plain velvets became popular, the importance of Wiltons, on a 
dollar basis, was almost as great as that of Axminster carpets 
and rugs. During the depth of the depression, when consumer 
incomes were greatl}'- reduced, there was a lessened demand for 
Wiltons, apparentlj'’ because they were much more expensive, 
on the average, than Axminsters. 

The study of the carpet and rug price series is presented to 
illustrate the alertness of the Bureau in relation to the problem of 
compiling and publishing its inde.xes of wholesale prices. Its 
activity e.xtends to other groups of price series as well. For 
example, beginning -with January, 1938, the results of a survey 
of farm-machinery wholesale prices were incorporated for the 
first time in the Bureau of Labor Statistics general indexes of 
wholesale prices. In 1941 the Bureau began publishing weekly 
index numbers of waste and scrap materials, carrying the index 
back to January, 1939. In "BTiolesale Prices” (June, 1941), 
the Bureau published a monthly index of standard machine-tool 
prices, ineluding 11 types of standard nonspecialty machine 
tools, carrying the index back to January, 1937. These new 
indexes are calculated on August, 1939, as a base; the monthly 
index of wholesale prices continued in 1943 to be based on the 
average of 1926 as 100. ' 



52S 


SiLUY OF YMHAniUlY 


McUwd oj Compulalion llbistralcd Die ^VLlghtell aggicgati\e 
method of computing an index numl>cr of pnccs. is illustrated in 
Table 76, umiir only li\e pnee wncs The li\c price scii'es 
selected for illiLstration arc those foi carpets and nigs, for which 
tlie Buieau’s weiglits are shown m Table 75 A procedure 
Minilar to that illustrated m Table 70 is used by the Buioau, 
I)ut with 8S0 pnee quotations instead of 5 


70 — Work Shut Illc^tkitino r\i/'OL.\Tios or Wfighth) 
VccRximvk Issue 
}ia»f~penod tcf$ghii 



A«erai 

! 


WciilhK 

«l I r.re 

rmr.rr,<aUtJ 

■" 

1 pi 

Urxhia 

1035 I 139 

PXI 

(1) ^ 

1 

(3) 

(0 

(5J 

{6> 

tuniiiHtcr i oirixtn ^ 

Vxminsttr 0X12 rues 
Plain Mbit 5 curjHlH 
I'liiii M bet h' 

\> ilton 0 X 12 ni>;s 

1 1 w 

22 715 

1 772 

2 581 
10 0U7 

2 OH ; 
27 930 

. 2 55i> 

3 200 
5a 521 

' 7 077 
2,015 
0.121 

1 7,801 
012 

11,090 

45,831 

1 11,383 
20,259 

1 24 4S4 

14 253 
50,291 
15,13S 
25,074 
30,919 

X(10lVlp,y7c) _ 


— 

— 

I 113,078 
•* too 00 

- i 

142,272 
- 125 82 

?Ji9o/p«C. 

Source Cirrra Jcaaa M ir 

1 1 Svxrtt 

J OeiiMP. 

liM iM'l MrthipJ t>t CaUulaiion d 


WholauU* l*ri<e Imlcm Jauriat a/ H-t Infraran S'oJtWxal a«K>c<cri»>i Sbl 32 (IDS'], 
ftI*o rtpriBlctl by lh« Uuicftii v( Liibw Slaiivik* aa Sct-mI Nu. COC. 


Accordingly, the index number of the wholesale prices of 
taqicts and rugs is 100 00 for the yisirs 1935-1D39, the base 
j)cnod, and 125 82 for 1911. The lattei figure is obtained by 
takingAi = Zp\qa/Zj>,/i9 From tho sj&tem of sjmbols already 
introduced, the symbolic pioentation of this form of index 
number is .is follows. 


— Po?0 


, -p-i? 

— po?# 


This in illustrated m the figures of Tables 7.) and 70, the 1941 
142 27“’ 

index being X 100 ** 125 32 




INDEX NUMBERS 


529 


Price Indexes and Quantity Indexes. Aggregative Index Using 
Given-year Weights. If given-year quantity weights are avail- 
able and used for computing an index, it must be noted that the 
following system of ratios would merely give an index of chang- 
ing aggregate values, without distinguishing which part of the 
change is due to price change and which part to quantity change; 


Rob — 




Roi — 




Ro 2 


Sp2g2 
~ Spogo 


Ro7t — 


^Png» 

Spogo 


Such an index is an index of aggregate value, made up partly of 
changes in quantity and partly of changes in price. In order, 
therefore, to extract from it that part of the change which is due 
solely to price change, the base-year prices must be multiplied 
throughout by the given-year weights. This fact makes the 
given-year weighting method a very long one to calculate; it loses 
the advantage, inherent in the aggregative index Aveighted by 
base-year quantities, of haAung a constant diAuder in securing the 
index. In addition, the method of AA'eighting by given-year 
quantities necessitates the two sets of cross products for each 
year — each year's prices multiplied by that year's quantities 
and by the base-year quantities. Following is the symbolic 
expression of the aggregative index of prices Aveighted by given- 
year quantities: 


00 


2pogo 

2pogo' 


P _ 

Spog/ 



Spog» ^ ^ 


Index of Quantities Weighted by Prices. An advantage of the 
given-year AA'eighting method is that an index of quantities 
Aveighted by prices can be obtained as a by-product, Avith com- 
paratively little additional calculation. The same numerators as 
those used in Eq. (7) can be used to calculate an index of quan- 
tity change Aveighted by given-year prices. For each year, 
given-year prices are multiplied by base-year quantities, and 
these aggregates are used as dividers. This Avill give an index of 
quantity Aveighted by given-year prices, as folio avs: 



530 


SI UDY OI Dl NAAIIC VAHl llilLU Y 


Qoa — 


2-Poqo 

ipogo 


Qti 


Zpigi 

ipjjo 


Qot ~~ 


Sp2g2 

2p2?o 




Unfortunately, this advantage m the given-year iicighting 
method is largely imaginary because the quantity data are not 
available soon enough for short periods of time to make it prac- 
ticable to construct monthly or weekly indexes In any case, it is 
also possible to obtain a quantity index weighted by given->ear 
pnees, using the following equation, which would provide a 
much simpler method 


ipogo’ i-Pog» 2pDgo 


(9) 


Quantity indexes aie constructed, however, by other methods, 
usually with more gencial apphcation of stratified samphng and 
wth other weights than prices, laigcly because of the difficulty 
of obtaining quantity data Not only are these other methods 
more coniomcnt to calculate, but they make it possible to 
handle matters having to do with weighting and bias in the 
results In using equations like Eqs (8) and (9), it is often very 
difficult to appraise the inaccuracies due to bias inherent m the 
method 

Quantity Indexes and Business Barometers. Indexes of 
Quantity of Trade or Production Several statisticians and 
economists made attempts, especially m the years immediately 
following the First World Wai, to construct an index that would 
trace \ anations m the physical volume of production or trade 
Pioneer efforts to construct such indexes, based upon scant 
matenal and with little m the way of a statistical theory to 
guide them, were made before the Fust World War by Wesley C 
Mitchell, Imng Fisher, and Edwin W Kemincrer During the 
war and postwar penod important progress was made, especially 
by Edmund E Day, Warren M Persons, and others In 1923, 
the latter published an index of trade for the United States, 
beginning with the year 1903 * The index of production is 
based \ery heavily on the index of employment, it might there- 
fore fail to reflect proiierlj the le&ults of technological advance 


* An Index of Trade for the United States Review of EconomicSlaiuUct, 
Prt-hmiDary Vol 5 ( tpnl, 1923), pp 71-78 Cf also GAnrii-LD, Frank. R , 
‘ General Indexes of Business Activity,’' Ftdeial Reserve DuUelm, Vol 20 
(1940) pp 490-501 



INDEX NUMBERS 


531 


After these experiments by pioneeiiiig individuals, several gov- 
ernment agencies and privately financed research organizations 
took up the task of developing indexes of trade and production. 
The most widely known and now most currently used index of 
industrial production for the United States is that compiled and 
regularly published in the Federal Reserve Bulletin by the Research 
Division of the Board of Governors of the Federal Reserve Sys- 
tem. This index is compiled from 95 individual series of monthly 
data, representing about 85 per cent of the total industrial pro- 
duction of the United States. The series include 22 durable- 
goods manufacturing industry series, 63 nondurable-goods 
manufacturing industry series, and series representing production 
of fuels and metals. This index is also regularly reproduced in 
the Survey of Current Business, published by the United States 
Department of Commerce.^ A reproduction of the entire index, 
vith its component parts, 1923-1940 by months, vuth the aver- 
age 1935-1939 = 100 as the base, can be found in the Federal 
Reserve Bulletin.- 

It is characteristic of the inde.xes of physical volume of produc- 
tion or trade that they consist of combinations of various series 
upon the basis of stratified sampling, the weights for the repre- 
sentative series being devised upon a priori knowledge concern- 
ing the importance of certain groups of activity in relation to the 
whole of business activity. These indexes treat the separate 
series statistically before putting them together. For example, 
the}'" remove seasonal variation and trend from the separate 
series and thus average together the cycles of the various separate 
series into the composite. The method of averaging employed 
Ls generally the aggregative method, although since 1940 the 
Federal Reserve Board uses an average of relatives weighted by 
quantities so that the final result is equivalent to what would be 
obtained by using the weighted aggregative method.^ 

^Federal Reserve BuUelin, Vol. 13 (February, iMarch, 1927), Vol. 17 
(February, September, 1931), Vol. 18 (March, 1932); for adjustments made 
necessary by the 1942 world war, see Vol. 27 (1941), pp. 878-881; cf. Survey 
of Current Business, Vol. 20 (1940), pp. 11—17. 

- Vol. 26 (1940), pp. 825-882; see also “Answer to Critics of the Index,” 
pp. 1047-1049. See also Woodlief, Tnoir.^s, and R. Coxklix, 

“ ^leasurement of Production,” Federal Reserve Bulletin, Vol. 26 (1940), 
pp. 912-924. 

^See WooDEiEF and Conklix, op. oil. 



532 


6/’t/7» Ot DiWMIC ViRlAUlLllY 


Compulahon 0 / Wctghls Illustrated Ihe index of manufac 
tures published by the Federal Rcseivc Board is weighted by tlic 
total value added by manufoctuic in the case of all manufactur 
mg industries and the index of mineral production is u eightcd 
by value of mineral products The sum of these two is the 
index of production The individual pioduction senes of which 
the manufactunng index is composed aie weighted, as nearly as 
possible accoiding to the same pnnciple * Accordingly the 
total value added by manufacturing industnes in 1937, as 
reported by the United States census, was distnbuted among the 
16 giQups represented in proiiortion to the value added for each 

Tabi-e 77 — Relative iMPOKfASCb o» Ihddstby Groups and Seleited 
Industries Included is tiil rbDCRAL Rlbervi Board Index 
OF IndU'.trivi Rroductios 
(Per cent of total wath 1037 weights) 


Senes 1037 Weights 

Industrial production 100 00 

Manufactures ^ 80 

Durable manufactures 37 93 

Iron and steel 11 00 

Machinery production 10 81 

Transportation equipment j 92 

isonferrous metals and tbcir products 2 81 

Lumber and its products 4 30 

Stone, clay and glass products 3 00 

Nondurable manufactures 40 87 

Textiles and their products 11 22 

Leather and its products 2 23 

Alanufactured food products 10 92 

Alcohobc beverages 1 84 

Tobacco products 1 24 

Paper and its products 3 13 

Printing and publishing 6 44 

Petroleum and coal products 2 14 

Products of chemicals 6 27 

Rubber products 1 39 

Minerals 15 20 

Fuels 13 01 

Metals 2 19 


Source Coadenaed horn F«!fTal Retent BalUta \oL28(igiO) p 919 

* For industry senes in which census data on value added by manufacture 
are not available other criteria had to be used, such as total \ alue of maml 
factured product, raw matriiaU consumed or man hours worked See ibid 
pp. 917 918 



INDEX NUMBERS 


533 


group, and then derived group totals were subdivided among 
industries and finally among individual products in a similar 
manner. Each individual series is thus assigned a hypothetical 
value-added figure, which Is then divided by the relative of the 
1935-1939 compared with 1937 in order to convert 1937 value- 
added figures to 1935-1939 base. The derived 1935-1939 figure.s 
for each series are then expressed as percentages of them own 
total to obtain the weights. These percentages represent the 
estimated relative importance of each series in the 1935-1939 
base period and are the weights applied to the relatives in 
combining them into the index of production. Table 77 repro- 
duces a summaiy of these freights. 

Using the weights shown in Table 77, the equation for the 
Federal Reserve Board’s index of industrial production is as 
follows: 

in which 


P 37 represents the value (or value added) per unit of putput in 
the weight-base period. 

Barometers, or Indexes, of General Business Conditions. Some 
composite indexes purport to be barometers or indexes of business 
and trade in general. These indexes are of two types: (1) A 
single series is sometimes believed to be a barometer of general 
business conditions and (2) a number of indexes of trade activity 
are combined in order to measure general business conditions. 

Of the first type, the most prominent one at present is prob- 
ably the index of electiical-power production, which is compiled 
from quantity figures published by the Geological Survey. The 
index of activity in the steel industry at one time was looked 
upon as a good barometer of general business conditions because 
so many industries are dependent upon steel or steel products. 
The trends in the average of security market prices are sometimes 
taken as a barometer of coming business conditions, or at least 
as a measure of e.xisting conditions. In wartime, the security 
markets often reflect conditions and war information that are 
not generally publicized. 



o31 


i,lUD\ OF DYWMIC VMtlABlLirY 


A good example of the second tjpe of index of general business 
conditions is that published currently by the New York Times and 
formerly by the Annalist This index has been iiidely used 
and until October, 1937, was reproduced m the Survey of Current 
Business, published by the United States Department of 
Commerce 

A more comprehensive example of this second type of index 
of business conditions is one tliat has evolved from the pioneer 
work of Carl Sn>dcr, whose procedure was based upon the 
theory that the fluctuations m total bank clcanngs arc made up 
of two vanables (1) pnee change and (2) change m physical 
volume of trade By constructing a price deflator, which has 
aheady been discussed, and then by using this deflator to cancel 
out from aggregate bank clcanngs that part due to pnee changes, 
he sought to obtain an index of physical volume of trade for 
the 3 ears 1875-1924 ‘ Modifications and refinements were made 
m the construction of tins index by Leroy M Piscr, so that it 
was known as the Snydcr-Piscr index of volume of trade for the 
United States It included 80 senes, classified as follows pro* 
ductive activity, 46 senes, pnmarj distribution, 13 bcrici, 
distribution to consumer, 8 senes, financial activity, 6 sencfi, 
general (such as life insurance, postal icccipts, elcctncahpowcr 
corporations, farmers, aud communication), 5 senes, and finally 
debits outside New York City Thus it came to be based upon 
the principle of stratified sampling This index of volume of 
trade and production is published monthly by the Federal 
Reserve Bank of New York in its Monthly Review of Credit and 
Business Conditions * 

Vanous forecasting services compile their rcspectiv c indexes of 
business conditions according to their particular interpictation 
as to what should best bo included m such an index and how 
best to weight vanous factors Carefullj worked out indexes 
of the marketing of farm products and forestry products arc now 
available as a result of the efforts of the Buieau of Agncultural 
Economics in the Umted States Department of Agnculture 
These are reproduced in the Survey of Current Business The 

* Sntdeb, Carl, “ \ New Gcanngs Index of Business for tifty Years,’' 
Journal of the American Slulisltail Astoctalton, Vol 19 (1924), pp 329-333 

• JoiiNsov, Norris Q , “New Indexes of l*roduction and Trade,” Journal 
of the imencan Slatielical Af4oetat%on,'Vo\ 33 (193S), pp 341-348 



INDEX NUMBERS 


535 


Bureau of Foreign and Domestic Commerce also publLshes 
indexes of domestic commodity stocks, and of “world stocks of 
certain outstanding industries, whiclx constitute good barometei-s 
of the related business conditions^ 

ADJUSTMENT OF INDEXES TO BENCH MARKS 

Ideal Cofidzitons for Stratified Sampling Nonexistent. In order 
to produce results approximating those of trae random sampling, 
conditions favorable to random sampling in the subgi’oups or 
strata must exist. For one thing, this means that there must 
be large numbers of items from which to draw "svithin each sub- 
gi'oup. It also means that the number of sample items must be 
sufficient to avoid the disadvantages of small samples. The law 
of large numbers must be given opportunity to produce the 
r&sults of true random selection mthin each stratum; such is the 
sine qua non of tnrly successful stratified sampling. Under .such 
conditions the method of selection causes no accumulation of 
bias. Such ideal conditions do not e.xist with reference to anj*^ 
known index number, not even the Bureau of Labor Statistics 
index of wholesale prices, which contains a total of more items 
than any other index. 

Nevertheless, the pattern of stratified sampling can nith 
considerable advantage be adopted as the guide to procedure in 
the construction of indexes of all tj'pe.«. Following this pattern 
the investigator first works out a .sj'-stem of classification of the 
data for which an index is to be compiled. Using the subgroups 
of this classification, he can then proceed according to the prin- 
ciples of stratified random sampling so far as it is possible to do 
so. When he finds that conditions ideal for random sampling in a 
subgroup fail to exist, the inve.stigator must resort to subjective 
means to secui’e results that he believes uill be representative. 

Inasmuch as all indexes contain, in some part, data that have 
been collected and processed by the use of such subjective 
methods, employed in the absence of ideal sampling conditions, 
it is desirable wherever possible for the statistician to find bench 
marks with which he can compare the residts of his sampling 

^Survey of Current Biusiness, Vol. 20, Annual Supplement (1940), pp. 
88-164. For a more complete discussion of barometers of general businesa 
conditions see Wesley C. Mitchell, Business Cycles— The Problems and Its 
Setting, (1928), pp. 291'-330; Joseph L. Snider, Business Slalislics; and 
Garfield, op. cii. 



536 


STUDY OF DYNIMIC VARlABJLITi 


procedure A coramon scn^e appiaisai of the o^er-alt result is 
the most generally used bench mark to judge \\hether op not the 
results are satisfactory, but this method presupposes an unusual 
amount of a pnon knowledge and of scientific critical judgment 
on the part of the statistician Sometimes more objective bench 
marks may be found to aid the statistical worker along his 
thorny path These mil bo illustrated m the ensuing section^ 

Reasons Why Iridexes Regtnre Adjustmmt The reasons wh^ 
indexes lequire adjustment to bench marks do not necessarily 
arise from faulty apphcation of the method of stratified sampling 
They arise fiom the nature of the universe from which the sample 
IS taken In connection with most types of data collected for 
the construction of indexes, the umverse is a discrete lather than 
a continuous one, in other words, the umverse consists of com- 
jiaratively small numbers of units, each of which constitutes a 
comparatively large proportion of the whole universe Often 
they cannot be considered as representative of each other 

When such a compaiatively small universe is subdivided, m 
Older to apply the stratified sampling technique, the strata con- 
stitute universes with still smaller numbers Added to this is 
the usual fact that only a portion of this remaining small num 
ber IS accessible to the data collector, m some cases, unav oidable 
bias itself constitutes a part of the reason for Ihe accessible por 
tion Under such circumstances it is almost impossible to 
realize the essential condition of randomness of selection m the 
lespective strata, and consequently stratified sampling technique 
gives less satisfactory results 

Such is the situation walh respect to sample data collected 
from business firms, especially manufactunng enterprises In 
some of the subdivisions, corporate enterpnse is on so large a 
scale that only a few farms lepresent a large portion of that 
stratum In all subdivisions, the size of the sample return 
measures, not only changes in the trends it is desired to measure, 
but also success or failure of the collecting agency in persuading 
firms to report The statistical technique of comparing iden- 
tical firms from month to month leduces but does not altogether 
obviate the cumulative error resulting from this weakness 

In addition, growth of an mdustiy, and hence growth in pay 
rolls, in output, in stocks ot iiuteiials, or in whatever is the sub- 
ject of investigation, occurs not only m evisting firms, but part 



INDEX NUMBERS 


537 


of the growth is in the rise of new firms m the industry. In some 
strata, perhaps the steej. and machinery industries, expansion or 
contraction of existing firms (and hence those reporting in a 
stratified sample) may accuratelj'' reflect proportionate!}’’ a rise 
or fall in business in that strata. But in other strata, like some 
branches of the textile industry or the food industry, the expan- 
sion or contraction of existing firms (and hence those reporting 
in a stratified sample) may not at all reflect proportionately the 
rise or fall in business. 

In heavy industry, Avhere plant and equipment constitute a 
large proportion of the business investment, cyclical changes of 
the sample might well be much greater than cyclical changes in 
the universe. This would foUow if the large firm, with heavy 
investment in plant and equipment, tended to curtail production 
instead of loAvering prices when faced with declining business 
prospects. 

In some branches of the clothing and food industries, in which 
small investment in plant and equipment and large numbers 
of small firms predominate, c 5 '^clical changes may be smaller in 
the sample than in the univeme. The birth of new firms or 
resumption of activity by old fii’ms is the principal manner of 
expansion in such strata. The death of old firms is the principal 
means of decline. The reporting firms are quite likely to be the 
ones that would not die at a rate so rapid as the average rate in 
the industry. 

Circumstances like those just described constitute only an 
illustration of the type of problem facing the statistician, who 
must continually endeavor to improve the sample of reporting 
firi-ns. Great as his efforts and ingenious as his imagination, 
may be the resulting sample is likely to show bias. 

For bench marks in connection wdth adjustment of indexes 
of employment and pay rolls, statisticians have made use of the 
successive issues of the census of manufactures, often called the 
Biennial Census of Manufactures, which appeared in 1914, 1919, 
and each odd year thereafter, including 1939. For years after 
1939 it should be possible to get sinrilar bench-mark data from 
the records of the Social Security Administration. 

By using the census-of-manufactures data as bench marks, it 
has been possible to check up on the monthly or weekly sample 
results obtained by the sampling process and to adjust them for 



538 


•>TVDY OF i>l \A\tlC VAmAUlUl 1 


anj bias that is discloacd bj such a check /ks each m,\\ stl of 
manufactunog ccoaus data became available, which was about 
two jears from the time it was taken, such indexes could be 
adjusted to the census data for errors that had accumulated since 
the last census In the meantime, the tesuUs of the sample were 
rehed upon os the host avaitabic infonuation, and at the same 
time subjective sampling procedures wcic continuallj studied 
wnth a view to improvement wherever possible To this end, 
the adjustment procedure often discloses areas m which the 
sampling results are especially m need of improvement 

The method of making such adjustment will be illustiatetl, 
not so much as a valuable statistical devnee m itself, but os an 
example to the student of the care and attention to form, pro- 
cedure, cross cliccking, and the hke, required of a good statis- 
tician Thus the following instructions, together with the form 
used, are presented as an exhibit to help the student visualize 
how a statistician plans his work and works his plan 
Method oj Adjwlment Ulustrakd A good example of an index 
adjusted to Umted States census bench marks is the monthly 
index of pay rolls and employment published by the Bureau of 
Labor Statistics The method of adjustment is icproduccd by 
permission of the Bureau of Labor Statistics and is applied to a 
monthly index of pay rolls m the metal stamping, enameling, 
and japanning and lacquering industry of New Jersey ' Iho 
vdjustment is earned out on Form BI/S 1238, June 1940, pre- 
sented here in Table 78 

The law data, which have been adjusted for the 1937 and 
earlier census figures or bench marks, but remain to be adjusted 
for 1939 census data, are entered by raontlis m columns (3), 
(9), and (13), the sums and averages (5 and /) for each of thc&e 
columns are then entered In column (17), using the lower part 
entitled “Formula if L is not available,”* enter the Umted 
States census figure for 1937 and 1939 (Zi and Zi) Calculate the 
latio Z 3 /Z 1 , and cuter m the space jiiovided, theiefore, 0 933280 
‘ The work on New Jersey dat» was done bj a \\ ork Progress Vdministra- 
tion project sponsored bj the New Jersey State Labor Department for the 
construction of monthly indexes of pay rolls and employment in manufat 
tunng industncs, Jariuarv, 1923-t)ccembcr, 1940 One of the author* was 
< ailed upon to serv e as consultant and director of the project 
’The part labeled “Formula if L is available'’ is u*ed with a bhnUct 
adjustment method involving several census periods 



INDEX NUMBERS 


539 


in Table 78. Copy ^Si fi'om column (3), in the space provided 
in column (17), that is, in Table 78, 883.82; this number mul- 
tiphed bj’’ the ratio equals S'^, entered in the space provided, 

in Table 78, that is, 824.85. 

In column (13) calculate lis bj' finding for the year 1939 
[the n for each month is found in column (12)]. = January 

nl + February nl -j- jNIarch nl ■ • ■ December nl, includ- 
ing an nl for each of the 12 months. 

In column (18) enter S 3 , copjdng it from the last row of 
column (17). Enter S3, copying it from column (13). Subtract 
S 3 from S 3 and enter the difference in the ne.xt row of column (18). 
Copy R 3 [from column (13)] in the ne.xt row of column (18). 
This value, R3, divided into the figure in the preceding row, 
S 3 — S3, gives the value of d. Enter d in the last row of column 
(18). This is the adjustment parameter. It is now used to 
adjust the series Ijy months as follows: 

In columns (4), (10), and (14) enter 1 + 7id for each month. 
These values should be obtained on a calculating machine as 
follows: Put 1.000000 in the machine, and add it. Put d on the 
keyboard, being careful to place it correctly for the decimal 
point. Subtract once, and record the answer for 1 -f nd in 
January, 1937. Subtract twice more (making —3 altogether), 
and record the answer for 1 -j- tid in February, 1937. Subtract 
tmce more (making —5 altogether), and record the answer for 
1 7id in March, 1937. Subtract once more (making —6 
altogether), and record the answer for 1 + 7id in April, 1937. 
The values for 1 -f- 72d in Majq June, July, and August are the 
same as 1 nd for April, March, February, and January, 
respectively. They can be found by revereing the above process 
on the calculator until 1.000000 remains in the machine and d 
on the keyboard. For September add d once and record 1 + 7ul 
for that month. Add d four more times (making 5 altogether), 
and record 1 -f 7id for October. By follomng a similar pro- 
cedure, guided by 71 in columns (2), (8), and (12), values of 
1 nd are calculated and entered for each month through 
December, 1939. 

Enter in columns (5), (11), and (15) the indexes in columns 
(3), (9), and (13), multiplied, respectively, by the 1 + 7id for the 
corresponding month in columns (4), (10), and (14). Add for 
each ^'ear, and enter sums, which equal K, S 3 , and Sj. Divide 



US liurcau of Labor Statistics term I) LS 123S June, l&iO 






(10) (17) US) 

To find h, year 1 only To find 5>' * To find d 


INDEX NUMiiEHS 


541 



Source: The method here illustuited wus supplied by bidiicy W. Wilcox. Oliiet btiiliaticmu oi the Uureuu ol bubor bluliHiics. uureau oi JLabor 
Stutistica, Sulli/ins 010, (October, W,3 J), 31)37 (Novembui, 1030), -1189 (Jaiiunry, 1037), uiid OSSS (April, 1937), on revised indexes of tne.tory employment. 
The revised indexes uppeur eunently in the monthly bulletin on “Employment iiiid I’ay Holls,” Serial No. 11589, and the back liRures arc published in 
full fiom 1919 to date in Federal Kcscrcc BulUtin, Vol. 2-1 (1938), pp. 838-800. Cf. also recommendations of the Committee on Ooveinment Statistics 
in " Uceent Progress in Employment Statistics" by Aryness Joy, Journal of the American Slalislical Aceocialion, Vol. 29 (1931), pp. 355-371. 




542 SI UDY OF' DYff 1 VIC V illl ABILITY 

each of thebe by 12, and enter the quotients m the next row 
Thus, in Table 78, £ = 883 88, Sj =* 654 19, and 5', = 824 84, 
and divided by 12 these become 73 66, 54 52, and 68 74 

In column (16) enter Uie 8i and Ki, as indicated m the table, 
subtract K from Si, and enter the difference Divide this dif- 
ference by 40 and enter the quotient in the next ron This 
figure IS h, the parameter for the second adjustment If 5i — iC 
IS smaller than 0 05, regardless of sign, do not calculate h In 
the problem illustrated, Si — fT = —0 06, hence h is calculated 
and found to be —0 002 

If h 13 calculated, enter m column (6), for each month, mh, 
that IS, in January enter h in February enter 2h, in March enter 
3h, in April enter 4h, in May to August, inclusive, 5h, thereafter 
declining each month, with 2A m November and k m December 
Enter m column (7) the sum of the figures for the respective 
months m columns (6) and (6) The sum of column (7) is equal 
to S'l If h IS not used, the sum of column (5) is taken as fi',, that 
is, if h IS Ignored, /l « «S( 



CHAPTER XX 

RATIONAL BASIS OF THE ANALYSIS OF TIME SERIES 


Elements of Variation in Time Series. The elements of 
variation contained in an ordinary time series may be illustrated 
b}"- building up a hypothetical time series. 


The first element in the time series is 
long-time growth, or trend. People living 
in the twentieth centuiy are accustomed 
to the idea that things grow, or progress. 
Table 79, column (1), shows years and 
months for 3 years, and column (2) shows 
a set of figures that grow at the constant 
difference of 0.2 per month. This column 
of figures is plotted in Fig. 135 (AA') and 
is a picture of the growth, or trend, in 
the hypothetical time series. 

Time series are also likely to have 
seasonal variations. Alanj^ economic and 
social phenomena vary from season to 
season in a similar mamrer each year. 
This is most evident in the case of 
activities affected bj' weather, such as 
agricultui’al production; but such patterns 
of seasonal variation occur in other events 
as well. Suppose the seasonal variation 
in the hypothetical time series is such that 
November is usually 58 per cent above 
the average month, Jub’’ is usually only 
43 per cent as large as the average 
month, etc., as indicated in Table 80 
showing the index of seasonal variation 



1943 1944 1945 


Fio. 135. — Two of the 
component parts of a 
hypothetical time series. 
AA' = annual trend, BB’ 
= assumed trend, modi- 
fied by annual seasonal 
variation. 


for the hypothetical time series. 


Figure 136 is a graph of this seasonal variation as it occurs year 


lifter year, 1943, 1944, and 1945. 


543 




oil SlUDi Oh DYNAMIC VAlllABlLlTY 


Tabll 79 — Hypotumical fiMt, Scaits Built Up 


<1) 

(2) 

(3) 

(4) 

(5) 


Growth or 

trcuU 

SonsniiiU 

vonotvon 


The ejele 
lut m 

1943 





January 

1 0 

1 45 

100 

1 45 

February 

1 2 

1 49 

104 

1 50 

March 

1 4 

1 41 

102 

1 44 

April 

1 C 

1 33 

103 

1 37 

ilav 

1 8 

1 19 1 

106 

1 2G 

fimp j 

2 0 

1 04 1 

109 

1 13 

July 

2 2 

0 95 

112 

1 06 

August 

2 4 i 

1 01 

115 

1 10 

September 

i 6 1 

2 42 

120 

2 90 

Octobtr 

2 8 

4 00 

122 

4 88 

November 

3 0 1 

4 74 

124 

5 83 

December 

1944 

3 2 

5 02 

125 

6 23 

January 

3 4 ; 

4 93 

120 

6 21 

February 

3 6 j 

4 46 

130 

5 80 

March 

3 8 1 

3 84 

140 1 

5 38 

Apnl 

4 0 

3 32 

150 

4 93 

May 

4 2 

2 77 

160 

4 43 

June 

4 4 

2 29 

180 

4 12 

July 

4 G 

1 98 

200 

3 DO 

August 

4 8 

2 02 

210 

4 24 

September 

5 0 

4 C5 

160 

7 44 

October 

3 2 

7 44 

140 

10 42 

Nov ember 

3 4 

8 53 

112 

9 55 

December 

3 b 

8 79 

111 

9 76 

1945 





lanuarj 

5 8 

8 41 

109 

9 17 

Februarj 

6 0 

7 44 

107 

7 96 

March 

b 2 

G 26 

105 

C 57 

Vpnl 

6 4 

5 31 

103 

5 47 

'\Ia\ 

6 G 

4 3b 

101 

4 40 



^ jy , 

SI9 

B Jifl 

Jul> 

7 0 

3 01 j 

90 

2 71 

August 

T 2 

3 02 

8a 

2 57 

September 

7 4 

6 88 ' 

75 

1 5 16 

October 

7 6 

10 87 

65 

7 07 

November 

7 8 

12 32 

00 

! 7 39 

December 

S 0 

12 56 . 

o5 

; 6 01 



HAT ION AL BAHHS OF THE ANALYSIS OF TIME EERIEH 545 


Table 80. — Seasonal Vahi vtion 
(In percentages of the average month) 


.lanuary. 
February 
March. . , 
April 


145 jj May 

66 

September 

124 1 June 

52 


101 jljuly 

43 

Xovember 

83 1 August 

42 

December 


93 

143 

158 

157 


When the seasonal variation and trend are combined, a line 
like BB' in Fig. 135 is produced; the data are shown in column 
(3) of Table 79. To obtain each monthly value for the line 
BB' each monthly coordinate of the line AA', that is, the growth 
element in the time series, has been 
multiplied by the index of seasonal vari- 
ation for the corresponding month. This 
has the effect of redistributing the total of 
the 12 monthl 3 >- figures of the growth line 
in such a manner as to make them prop- 
erly reflect the seasonal element. Thus 
the trend figure for January is multiplied 
by 1.45 (or 145 per cent) while the April 
trend figure is multiplied by 0.83 (or 83 
per cent). 

A third element of variation in time 
series is cyclical fluctuation, which may 
extend over several years. For example. 

Fig. 137 shows the rising and the falhng 
movement of a cyclical fluctuation by 
months that occurs over a period of 3 
years ; this is shown also in column (4) of 
Table 79. In column (5) of Table 79 and 
in Fig. 138 are shown the effect of combining also the cyclical 
movement. The figures for the respective months are now 
altered according to whether the cycle is carrying them upward 
or downward, and the percentage figures for the C 3 ^cle, shown in 
column (4), depict this upwai'd and downward s\ving of the cycle. 
The cycle is put into the data by multiplying each monthly 
figure in column (3) by the corresponding monthl}^ index of the 
cycle found in the same row of column (4). The results are 
shown in Fig. 138, which is the final hypothetical time series; 
the data for it are in column (5) of Table 79. 



Fig. 136. — Seasoual 
variation in the hypo- 
thetical time series. See 
Table 80. 




Sib OP i>i viWic Vi/iMfliLni 

Two unportant cfTccts of combining the grow th element and 
the bcasonal \ anation element are noticeable from Fig 135 
In tbo fir&t place, the combination has a tendenej to obscure the 
trend It 13 still clear in line BB' that there is a nsing tcndcnc> , 
but the ^^^de sweeps of the seasonal fluctuations tend to conceal 
the exact nature of the nsc, for without the line AA ' m hig 135 it 
would be difiicult to visualize preci'^ly what the slope of this 
trend actually is In Ihesccond place the combination definitely 



194} 1944 1945 1943 1944 194} 


tjG 137 — Tie cycle in tie Iyp*>- tw 13S ~\11 tl re© eompouenl cle- 
theUcal tune senes See Table 70 ii e»tif of tl e 1 ypotl etical time senes 
column (4) combiocd be© Table 79 colunn(S) 

distorts the shape of the seasonal vanalion, la two wajs (1) It 
causes the \ alleys and peaks to be throwm out of line anth- 
mcticall> (2) It miiiiimzcs the size of the seasonal variation 
where trend is low and cimgeeratcs the size of the seasonal 
\anation where trend is high 

From Fig 138 it is clear that the effect of including the 
cychcal movement is further to obscure the trend or growth 
element and to distort still more the character of the seasonal 
\ anation It is m approMmatcl> this condition that most time 
senes exist m their raw state Raw data of time senes contain 


RATIONAL BASIS OF THE ANjILYSIS OF TIME SERIES 547 

in varying degrees elements of all thi-ee of these types of fluctua- 
tion. Some have little seasonal variation, some have a great 
deal, and some have none. Many have ri.sing trends following 
population and general growth, while a few have declining 
trends because they represent decajdng or disappearing types 
of economic or social activities. In practically all time series, 
cycles of varying length and varying amphtude occur. 

In addition to the three elements illustrated by the hypo- 
thetical case, most time series contain fluctuations due to unusual 
or residual occurrences, such as the effects of floods, storms, or 
strikes. This gives four elements or types of fluctuation and 
these four types of fluctuation serve as a good classification for an 
empirical start in the analysis of time series. ‘ 

GENESIS AND PURPOSES OF THE TIME-SERIES ANALYSIS 

The hypothetical problem just illustrated consisted in a 
synthesis. The study of time series is analysis — a reversal of the 
procedure that has just been demonstrated. This breaking up 
of time series into its constituent elements, and the various com- 
plications involved, constitutes the subject of time-series analysis. 

Why do economists, social scientists, and statisticians analyze 
time series? What started them along this line of procedure, 
and what are its advantages? The answers to these or other 
questions as to the significance of time-series analysis have in 
general a threefold basis: (1) interest in the population problem 
and the discovery of the law of organic growth, (2) concern for 
the general problem of the so-called “business cycle," and (3) 
preoccupation with the variety of problems associated vnth 
seasonal influences upon business and social hfe. 

Rational Trends. Historical Background. In 1798, Thomas 
Robert Malthus, a minister of the gospel and a political econo- 
mist, wrote an Essay on the Principle of Population, in which he 
advanced the fundamental principle that the law of growth of 
population is geometric — population, he said, tends to gi’ow in a 
geometric progression. The curve representing population 

* This is the conventional classification of types of fluctuation that occur 
in time series; it was presented in detail by W. M. Persons of the Harvard 
Committee of Economic Research and published in the Review of Economic 
Stalislics, Preliminary Vol. 1. See also articles by the same author in the 
American Economic Review, Vol. 6 (1916), pp. 739-769, and Publications 
of the American Statistical Association, Vol. 12 (1917), pp. 602-642. 



5 18 iiiUDY OF DYXA'dIC » iRIABIUTl 

growth would accordingly be positive exponential curve, 
similar in character to the curve repres>cnting the growth of a 
pnncipal «um of money at compound mterest 

While some of the doctrines of Mnithus regarding the controls 
to population growth aic no longer accepted as tenable, the 
fundamental principle of the tendency of population to grow 
gcometncall} has not onlj been accepted with regard to popula- 
tion theory but has been widely applied in other fields To 
people of the twentieth century this principle seems almost 
axiomatic, for they arc famihat vnih the history of the iiinetccTith 
century, when the statistics show such a growth of population 
and such a de\ clopment of many kinds of activities according to 
this principle of geometric progression 

The pnnciple of growth was not so obiious to those living at 
the time of Malthus, nor to those living m the middle of the 
nineteenth ccntuiy Consequently, it was startling and new to 
see the same principle applied to growth in certain cconomio and 
social phenomena, as was done by William Stanley Jtvona, an 
Cnghsh economist, in his celebtated book on The Coal Question 
(1865) Cliflpter IX of that book is entitled Of thoNaturalLaw 
of Social Growth In this he propounds the idea that many of 
the phenomena of economic and social life follow the same law 
of organic growth as population In some, the progressive rate 
of geometric growth is greater than that of population, m some, 
less, but m all the giowtli is geometne In another chapter of 
the same book, Jevons applied this iiruxciple and tested it with 
reference to England’s piogrcss in industry His contiibution 
was of the nature of the piojwsal of a hypothesis that served as a 
challenge to mathematically minded economists like himself 
and others and soon stimulated the development of ideas as to 
how best to wnte the equation for the curve that would repic- 
sent grow th of population By such an equation, it w as thought , 
population could be foiecast far into the future as well as foi 
mtercensus j ears 

Population Cuives In 1891, A S Pritchett suggested that 
an equation of the form F = o + bt + cP + dP would fit the 
curve of population growth The subject of equations for the 
population curve became one of wide concein to population 
students, economists, and sciuitists m geneial, as well as ot prae- 
tical interest in obtaining accurate estimates of population 



RATIONAL B^ISIS OF THE AN.iLYSIS OF TIME SERIES o49 

between the dates of taking the census. In 1907, Raymond 
Pearl proposed that the form of this equation should be * 

P — a -i- b{ -r ct- ~r d log t* 

The problem was again approached by G. Udny Yule, an English 
statistician, in 192d;^ and in later years the discu-ssion was 
continued. - 

Perhaps the most striking contribution on the subject is that 
of Raymond Pearl and Lowell J. Reed, who in 1920 advanced the 
idea that the population cun'e .should not continue to ri&e 
indefinitely but should level off after some period of time and 
that thus the population cuiwe showing the law of growTh would 
not follow the compound-interest cun^e indefinitely. Rather, it 
would resemble the curve shown in Fig. 145 in Chap. XXI. 

The mathematical characteristics of this curve and its equa- 
tions are presented bj’- these joint authors in the Journal of the 
Royal Statistical Society for 1927. f As will be clear from a glance 
at Fig. 145, the shape of this cui-ve indicates about the growth of 
population or the law of organic growth that the first period 
of relative^ slow arithmetical growth is followed bj’’ a period 
of very rapid arithmetical growth but that finally a period of 
slowing down of this rapid arithmetical growth occurs so that the 
curve at the top assumes an asjTnptotic character. 

Early Population Theories. Quetelet remarked that l^Ialthus’s 
doctrine resolved itself essentiall}'- to the proposition that, under 
the most favorable industrial circumstances, population could 
grow no more rapidly than in an arithmetical progression, 
although, of course, he stated the geometric law of growth as a 

* Knibbs, Geobge H., “The Laws of Growth of Population,” Journal nj 
the American Statistical Association, Vol. 21 (1926), p. 381. 

^ Journal of the Royal Statistical Society, Vol. 88 (1925), pp. 1-62, which 
contains an excellent historical summary of the problem of curve fitting to 
population grourth. 

- Reed, L. J., and Ratmoxd Pearl, “On the Summation of the Logistic 
Curve,” Journal of the Royal Statistical Society, Vol. 90 (1927), pp‘. 729-746. 
The mathematics of the curve was discovered, say the authors, by 1 erhulst, 
according to Qu4telet writing in 1838, and was agaui applied to population 
by Pearl and Reed in 1920. Cf. Pearl, R-WMOxn, Studies in Human 
Biology (1924), Chap. XXIV, The Cun'e of Population Grov'th. 

t Op. cii. 



550 STUDY OF DY^fAillC VARlABJLITi 

tendency It could grow only in antbmetical progression 
because it would be kept down to that rate by the fact that sub- 
sistence grows only in arithmetical progression He also pomted 
out that the theory of population groivth up to the time he Mas 
ivntmg (1836) had not been developed to the point where it 
could be considered "dans le domaine des sciences math6- 
matiques, auquel elle semble sp^cialement devoir appartemr”i 
Even so, Qu^telet himself never went to the point of developing a 
mathematical equation expressing the law of population growth, 
although in other ways his contributions as a population thconst 
are outstanding However, he did reach the point of suggesting 
that the law of population growth is like that of a body traveling 
through a resisting medium that tends to attain a hmiting 
velocity 

Yule suggested that this analogy probably inspired Verhulst, 
professor of mathematics at the £!coIe Militaire, to a controversy 
ivith Qu4telet on the subject The problem of devising a 
mathematical law of population groivth was actively studied by 
Verhulst for a number of years Ho fitted logistic curves to the 
population histones of several countnes for os many years as 
data were available, but the linuted amount of data did not 
inspire confidence m the results * This work of Verhulst seems 
to have been forgotten until the time of the Pearl-Reed studies 
of 1920 Pearl and Reed’s discovery of the law of population 
growth in the mathematical form de\ eloped by them was 
independent As Yule says, they seemed to have been unaware 
of the formulation by Verhulst 

Basis for Rationahzmg Trends. The attempt is made by 
students of the law of population growth to rationalize the 
fitting of such a logistic curve to expenenced growth of popula- 
tion m many parts of the world, and at different times, by basing 
their reasoning upon the following points 

PhMime fUruxfJJef^ Xouis Haumaa et Conip 1836^, pp 2S3, 287 

* Notice sur la loi que la population sutt dans son accroissement (correspond 
ance math4matiqu0 et physique publi5e par A. Qu4telet, 1838), tome 10 
(also numbered tome 2 of the third series), pp 113-121, and by the same 
author, “ Eecherches math4matiqaes sur la loi d'accroissement de la popula- 
tion,” Nouveaux mimoires de VAeadimse Royals des Sciences et Belles Lettres 
de Bruxelles, tome 17 (1845), pp 1-38, “Deuxi6mo mtmoire sur la loi 
d accroissemcnt dela population,” ibtd,tome20 (1847), pp 1-32 Citations 
from G Udny Yule, op cU ,p 57 



RATIONAL BASIS OF THE ANALYSIS OF TIMF SERIES 551 

1. The construction of such a curve through the plotted points 
showing actual population grorrth in a large number of places 
produces a good fit. 

2. Biological experiment under controlled conditions, vith 
other species than man, produces increases in munbers in a 
manner following such a curve. Thus Pearl made such an 
experiment mth fruit flies under controlled conditions. ‘ 

3. Studies of trends in birth rates and death rates, in their 
relation to population gi'owth, appear to fit into the theory that 
the law of population growth follows this cuiwe. 

4. Studies of death rates by age distribution of the population 
and the relationship betumen age composition and total death 
rate and birth rate of a population appear to lit into the law of 
population thus formulated.- 

5. While it is true that the parabolas of earlier writers fit 
empirically the population growth wherever tried, such a curve 
fit cannot be rationalized, because the extemsion of the parabola 
goes on to infinity. On the other hand, the logistic curves of the 
Verhulst, Pearl-Reed, or Gompertz variety appinach a limit in an 
asymptotic manner, which seems to be a more rational manner in 
which to view the law of population growth. 

6. The asymptotic limit that it is assumed population is 
approaching can be closely approximated' by study of the circum- 
stances surrounding the determination of the factors influencing 
population growth. 

Thus, it is recognized in this theory of the law of population 
growth that should technological changes comparable -with the 
industrial revolution occur, the asymptotic limit might have to 

^ Cf. Peabl, E,., The Biology of Death, pp. 253-254. Cited in Y tjle 
op. oil., p. 22. 

- These ideas have reached the general public as well as the scientific 
group, through such articles as Robert A. Kuczynski, “The World’s Future 
Population,” The New Republic, May 7, 1930; Aaron Hardy Ulm, “Our 
Falling Birth Rate Is Studied by Experts," The New York Times, Mar. 2, 
1930; Louis I. Dublin (Statistician of the Metropolitan Life Insurance Com- 
pany), “America Approaching Stabilized Population,” The New York 
Times, Mar. 4, 1930; and by the same author, “Our Aging Population; Its 
Vital Effects,” The New York Times, Jan. 4, 1931. Cf. also Dtiblix, 
Louis, I., and Alfred J. Lotka, "On the True Rate of Natural Increase, 
Journal of the American Statistical Association, Vol. 20 (1925), pp. 305-339 j 
and Dublin, Louis I., “The Statistician and the Population Problem; 
ibid., Vol. 20 (1925), pp. 1-12. 



552 


SlUDi or DYNAMIC ViltlAUlini 


be laised and that the law of population gronth o\er a period of 
centuries may be concei\ably a senes of ogive like cycles 

Criticism of Rationalized Trends Iloivever, this rationalistic 
view of curve fitting to population and the attainment m this 
manner of a mathematical law of population growth have not 
gone unchallenged Prof A L Bow ley, an outstanding English 
statistician, says, "I regret that so much prominence has been 
given to the logistic equation It certainly has the merit, and 
the damger, of mathematical neatness, and it exptesses what may 
be regarded as a fundamental law of population — that is, that 
population cannot mciease indcfimtely m constant geometnc 
progression Theie is however, no leason a prion to suppose 
that the damping down of the increase is of so regular or uniform 
a nature that a mathematical function of the same form repic* 
sents it m all times and in all places, and none a pnon to justify 
the use of a linear term (out of all possible functions) foi this 
purpose We should rather anticipate that the form of the 
function would be neither geneiat nor linear The justification 
for the logistic form is purely empincal, and, in fact, we are asked 
to accept it because it does give lesults which agree with the 
lecords of certain populations Any other curve which gives as 
good an agreement has sinular claims for representing the senes 
of records The closeness of the agreement is, I think, unduly 
accented by the very small vertical scales used by Dr Pearl and 
Mr Yule 

T H C Stevenson, anothei English statistician, rather 
prosaically declares that he finds sufficient explanation, without 
lesort to logistic curves, for the rapid decline in birth rates since 
the end of the nineteenth century, m the dis&cnunation of knowl- 
edge of contraception * 

More recently, the whole question of the rationality of cuive 
fitting was taken up in an admirably thorough manner by 
George H Knibbs, who&e findings are apparently that the 
mechanical process of the curve fittmg is empincal and must be 
accepted as empincal but that the law of population growth 
may be conveniently expressed by such equations when it is 

‘ From remarks on lule s paper, op eU p 76 

‘“Ihe Laws Governing Population,' Journal of (he Jtoyal Slaluttcal 
Society, Vo! 88 (192o) pp 63-76 



UATIONAL BASIS OF THE ANALYSIS OF TIME SERIES 553 

thoroughly understood how those equations apply, and also 
their hmitations.* 

It is natural to scientists to be skeptical, particularly of other 
scientists’ startling discoveries, and the student of social science 
must get used to such controversies and pick and choose for 
himself what he believes to represent progressive development 
of human knowledge and what merely overzealous creative 
imagination. It is in these attempts to explain phenomena that 
the progressive development of human knowledge occurs. 

Application of Rational Trends to Social Philosophy. It was 
pointed out above that Jevons had advanced the hypothesis 
that the law of organic gi'oni;h applies also to social and economic 
phenomena. Folio ndng the example of the population curve- 
fitting group, scientific curiosity turned to the discover}' of a 
rational conception of curve fitting to social, biological, and 
economic phenomena in order to replace purely empiiicivl 
methods. As Wesley C. Mitchell has pointed out,^ “A .stei) 
toward such a conception is represented by the frequent inter- 
pretation of certain trend lines as showing the ‘growth factor.’ 
Statisticians dwell Avith satisfaction upon their demonstrations 
that certain industries have expanded decade after decade at a 
substantially uniform rate, .or at a rate which has changed in 
some uniform way. They take almost as much pleasure in con- 
templating the somewhat similar rates at which difierent indus- 
tries have gro^vn in given periods and countries. Xor are they 
at a loss for explanation of these uniformities. In view of the 
increase in population characteristic of the great commercial 
nations and of the advance in industrial technique, it seems 
scarcely fanciful to think of modern society as ‘tending’ to 
produce an ever larger supply of goods for the satisfaction of 11“^ 
wants. On this basis, cyclical fluctuations appear as alternating 
accelerations and retardations in the pace of a more fundamental 

^ “Laws of Growth, of Popidation,” Journal of the American Sialietical 
Association, Vol. 21 (1926), pp. 381-398; and Vol. 22 (1927), pp. 49-59. 

• Business Cycles — TheProblem Stated and Its Setting, (1928), pp. 221-224, cj. 
Pbescott, Raysionx) B., "Law of Growth in. Forecasting Demand,” (192p, 
Journal of the American Statistical Association, Vol. 17 (1922), pp. 471-479. 
Later, Leroy E. Peabody fitted such a curve to railway traffic in the United 
States, “Growth Curves and Railway Traffic,” Journal of the American 
Statistical Association, Vol. 19 (1924), pp. 476-483. 



554 


UDY or D¥NA\nC VAiil ililLIl 4 


process Secular tiends, in iJiort, aie taken to measiiit. ttonomit 
progress generation by generation 

*A bold speculation of this sort has been ventured by Ilaymond 
B Prescott He suggests that perhaps 'all industnes, uhose 
growth depends directly or indirectly upon the ability of the 
people to consume their products,’ pass through similar phases 
in the course of their development Four stages seem to be 
common 

1 Penod of expenmcntation 

2 Penod of growth into the social fabne 

3 Through the point ahere the growth increases, but at a 
diminishing rate 

4 Penod of stabihiy 

“On this basis, Prescott suggests that the secular trends of all 
such industnes may be represented by a single type of curve — 
that yielded by the Gompertz equation Every country may 
have a different rate of growth and so may every mdustrj, 
because no two industnes have the same combination of in 
fluences They will trace the same type of curve, however, 
even though the rate of growth is different ” 

More recently, an ambitious and carefully studied attempt to 
rationalize the whole subject of trends in economic phenomena 
was made by Simon S Kuznets, of the National Bureau of 
Economic Research ' Kuznets analyzes the vanous factors 
making for growth, and also making for slowing up of growth, 
under the following items 

1 On the bide of growth 

Population increase 

Changes in demand 

Technological changes 

2 On the sloiviDg up of growth 

Slackening of technological progress 

Itetarding milucnce of other slower industnes 

Funds available for expansion decrease ui relative size as mduslrv 
grows 

Competition of later dcvreloping industnes in other countries 

‘ Seci lar Movemenls tn Production and Price* (1930), m 1943, Kuznets a 
work is still the best statistical study of this t>pe For more recent trend 
btudiea of a different type, sec Edwia F ncltey, Economic Ffuctuotion* tn ihe 
United Stales, 1866-1914 (1942), and Norman J Silbcrhng, The Dynamics 
of Business (1942) Theise stu^e^ use subjective methods for vnaljzing 
trends and cycles 



ItATlON-ciL BASIS OF THE ANALYSIS OF TIME SERIES 5o5 

ICuzn6ts fits logistic cmves to a large number of production 
senes and also fits appropriate curves to the corresponding price 
series. It should be noted that this type of rationalization does 
not apply to price series, and as a rule the curves that Kuznets 
fits to his price series were merely parabolas and represented 
empirical trends. One of the most interesting results of hi.s 
work is his discovery and analysis of "secondary trends.” 



Fig. 139. — Production of Poitland cement in the United State's ^\ith logistic 
trend line, 1880-1924. 

Thus, from a large variety of data, he took out the long-time 
growth, upon the assumption of the existence of a logistic growth 
element, and he found, not only cycles, but also longer wavelike 
movements of 11 to 20 years. This is illustrated by Figs. 139 to 
141, reproduced from his book and showing the type of analysis 
as applied to cement production and prices, 1880-1924.^ As 
seen in Fig. 139, the heavy line represents the logi.stic cmwe, and 
there are long sweeps of the actual data in waves above and below 

' Op. cit,, pp. 100-101, reproduced by permission of tlie author. 




o5(j SIUDl (it JtWlMlC I nuiuniii 

this grottth cunt, 'is wtU u> cyclical mo%cmeuts of shortir 
duration Figure 140 shows i paralwK fitted to the cour«i c f 



Ii 140— lactorj ^rlcca of PorlUiid c« i «nt in tl 0 btatci ong nl d»lH 

and piimaty trend 18$(V-1924 

prices of cement during the same penod Figure 141 shows the 
long wavebke movements m production and m pnees, with the 
lelatiae fluctuations of the actual data abo\e and below these 



secondary trends Kuznets ealk the logistic groivth curve the 
“primary trend bne,” and the heaa>, black, wavclike line m 
Fig 141 represents the secondarj trends of the pioduction of 




liATlON^U. BASIS OF THE ANALYSIS OF TIME SERIES 557 


cement. The actual data fluctuate above and below these 
secondaiy trends in major and minor cycles. 

Before the publication of Kuzuets’s work these longer move- 
ments had been studied by C. A. R. Wardell.* Warded called 
the movements “major cycle.s” rather than secondaiy trends, 
and his method of analysi.s wa.s (piite different from that followed 
by Kuznets. Ife also attempted to give an explanation of the 
major busine.ss cj'cle that Kuznet.s reject.s.- In 1927, also, there 
appeared in Ru.ssian a discussion of the whole problem of major 
■cj'cles, which contains a report by Kondratieff and a counter- 
reply by 13. T. Oparin. To explain these major cycles Kon- 
dratieff developed the theoiy that they arc e.ssentially cyclas 
of expansion and contraction in the growth of the basic capital 
equipment of a country. 

Thus, .starting with the desire to define the law of population 
groviih more precisely and to bring the population problem into 
the realm of mathematical treatment, scholars have carried on 
by analog}’ into other fields; so far as economics is concerned, 
the principal result so far is the discovery of these long wavelike 
movements. Not only do the theoretical economists need to 
explain the old-fashioned busine.ss cycle (which M’as always a 
rather vague concept), but they now are challenged to explain 
(1) secondary secular movements or major busine.ss cycles, (2) 
ordinary business cycles, and (3) minor business cycles. The 
analysis of time series, then, must include some additional types 
of fluctuations from those described in a jireliminary manner at 
the begiiming of the chapter. 

The following clas.sification of movements is now suggested.® 

1. Trend, or long-lime growth, which appears to be logistic in 
character and for which a mathematical formula may be rational. 

2. Cyclical movements of three types, for which a rational 
mathematical formula is not appropriate. 

a. Secondary secular movements or major cycles. 
h. Cj’cles (the old theoretical busine.ss cycle), 
c. Minor cj’'clc.s. 

^ .4(1 Investigation of Economic Data Jar Major Cycles, Thc.sLs (Univereity 
of Pennsylvania, 1927). 

- Op. cit., pp. 205-2G0. 

“ Cf. classification suggested Ijy Prof. VVillford 1. King, wiiieli is similar, 
in “Principles Underlying the Isolation of Cj’eles and Trend.s,” .lournal of 
the American Statistical Association, Vol. 19 (1924), p. 408. 



558 S7l7/;> OF DYSAMIC Y Uil iBIUJY 

3 SeasoriaJ taria(t0ii8, for whicU a mathematical formula is not 
lational 

4 Irregular fiucluaixom, such a$ th<»e clue to uai-s, epidemics, 
floods, or strikes Ihese are called “residual fluctuations” and 
may follou the normal cnr\e ^ 

Empincal Trends. Trend analysis, that is to saj , the applica- 
tion of mathematical processes m order to obtain equations 
describing direction of movement of a time senes, may he 
applied, not only for rational eneb indicated in the discussion of 
the law of Qiganic growth, but also empirically where no a pnon 
knowledge about the character of growth or law of movement or 
trend exists Indeed, the scaich for such a law may have no 
bearing on the analysis, the trend may be sought for the purpose 
of isolating and studying cyclical mov ements When trends art 
found without seeking to venfy some hypothesis concerning a 
law of gioirth but merely with respect to given data, they are 
empirical ticnds 

Application of Empirical Trends to Cycle Analysis Jhe 
thud factor meotioaed at the beginning of this cliapter as a 
force stimulating statistical analysis of time senes has been the 
abstract study of the business c}c!c Such abstract analysis 
has challenged the mathematical economist and the statistician 
to discover and to apply methods of statistical analysis that 
would measure the cycle 

Economic history of the modern era has been one of alter- 
nating periods of relative piospenty and relative depression 
and has also been characterized by penods of more or less violent 
speculative activity The Missia»ippi Scheme and the South 
Sea Bubble burst in France and England in 1720, and there 
occurred commercial crises of major importance in 1763, 1772, 
1783, and 1793 During the eighteenth century these recurring 
penods of ensis excited much discu&aion, but eighteenth century 
wntmgs dealt mainly wuth the dramatic surface events and did 
not develop a theory explaimng them. iWl indeed the funda- 
mental pnnciples of econonucs were not formulated until the, 
latter half of the eighteenth century The publication, of Adam 
Smith's Weahk of Naii&ns m 1776 is usually taken as the debut 
of economics as a science 


1 See pp 283-297, and o70, 648 



RATIONAL BASIS OF THE ANALYSIS OF 


TIME SERIES 


559 


'Wliile a group of ecouoniisls following Adam Smitli developed 
a theoretical cxidaiuition of the operation of economic forces 
under normal conditions, or in the long run, another group that 
assumed the role of critics of the “economists" developed 
theories of the business cycle. These were such men as Sismondi, 
Rodbertus, and Jvarl IMar.x. J. C. L. Simondc de Sismondi, an 
Italian »Swiss, had originally been a thorough coinxn’t of Adam 
Smith and laissczfairc and had become the Continental expositor 
of his thcorie.s; but as he said, writing in ISIS and referring to 
the depression of 1815-1817, he was deeply affected by the com- 
mercial crisis that Europe had experienced and by {.he cruel 
sufferings of the industrial workers that he had \vitne.ssed in 
ftal}', Switzerland, ami France and that all reports showed to 
have been at least as severe in 'England, in Germany, and in 
Belgium.' He set about developing a theory to explain the 
recurrence of such periods, and in his work are found many of 
the ideas current even today concerning the origin and cxj)lana- 
tion of the busine.ss cycle. He suggested that the busine.ss cycle 
is due to the faulty organization of the capitalist system and that 
the .system is planless and therefore needs planning. IIo also 
.suggested the explanation that what is needed is a better di.s- 
tribution of income, lie suggested the oveisaving hypothesi.s. 
Ilis principal e.xphination i.s tlie incqualitj'- in the di.stribution of 
incomes re.sulting in glutting of the markets and the production 
of crises and depre.ssions. 

The idea that , commercial crises are cyclical in character 
evolved earl}-' in the nineteenth ccntuiy; some even went so far 
as to advance the theory that they occur cvoiy 7 or every J 1 years. 
In 1875, this led the economist and statistician, \\\ S. .Tevoms, to 
propound a theory that the busine.ss cycle is due to cycles that 
occur in sun spots, which it had been discovered have a rhythm 
of about 11 years.' During the latter half the nineteenth ceu- 
tuiy a number of .statistical attempts to discover the bu.sine.ss 
cycle were made. The attempts used the idea of smoothing 
out the irregular fluctuations in the curves of raw data and 

‘ Mitcheu., op. cit., pp. l-u, Tlie historical material here given on fho 
lHi.sines.s cycle is taken principiilly from this source. 

* For a more complete discussion of the history of business-cycle theory 
than it is possible to give here, sec ibid, and also Ernst Wagemann, Economic 
Rhythm, (1930), either of wliicli contains further bibliograpliical references. 



5(i0 Ot D\ \ iMIC 1 Ud llilllJ I 

thciebj cUnfynjg the cyclical inovcmcnt-> Ihc cull«^fc 
example's of such fttatistical work appear to be in 1881 ^ 

Both Jc\ons and the latu expeiimcnterss of tlio niiitttcnlh 
century ncre content aiith attempts to disco\ci cyclical mo\e 
raents m separate individual senes In 1909 Beveridge lu 
England in 1911 luhn m Trance and m 1913 Moitara in 
Italy conceived the idea of combining a numbei of suies into a 
composite statistical mcasuie of the business cycle Ihc woik 
of carrying out this task was tbcai laigely taken over bj the 
tmencan st itisticians in the construction of the so called 

baromctci-s. of business conditions tliat have been dt'^enbed 
in Chap Xl'k Index Numbcis The ptnod up to about 1914 
may be characterized as one during which mteicst m the subject 
of the business cjcle \as mttnae Lconomtsts weic m shaip 
contra vcisi with the business cjcle thcoiists — denjmg emphat- 
ically the implicatioua tliat tlicy ditw fjom tlicii analysis of the 
statistics available and from then thcoietical expUnatious of the 
business ejelo At the same time tJie disturbing tlieonos of 
the business cycle students had gicatcr claim to geiieial intoiest 
because they touched upon a moic vital and piesent thing tlian 
was customarily dealt with by the conventional economist 
Ihe conventional economist was explaining liow things tend to 
happen under normal conditions and the business cycle theorists 
loudly pioclaimed tliat we never live under noiroal conditions 
and that the theories of the economist weie thercfoie useless 
\t the same time the interest of the practical busmessman was 
Housed by the desiie for knowl dge of the relationship between 
his ow 11 particular business and the general business cycle 

Development of Technique for Time-series Analysis ihc 
jiiessure to develop a statistical technique to analyze the prob 
lem was thus very gieat and the accumulation of available 
statistical material to analyze had been rapid for a numbei of 
yeais The technique that devclojicd assumed two general 
ehsiactenstics one of iiJwJi has since heen extemi^dy used 
the other less frequently 

The fii st method of technique that developed w as the discov cry 
statistically of the ejele in time senes by the removal of the 

‘loYvriNO J H audit H HooxER A Compar son of the H ictua 
tions in the Price of heat and in the Cotton and S Ik Imports into Great 
Britain Jo rnal of 0 e Hoj/al Slalutteal Soettij Vol 47 (1884) pp 34 64 



RATIONAL BASIS OF THE ANiSJLYSIS OF TIME SERIES oGl 

trend from a series of annual data. Trends were fitted empir- 
ically to the data by the method of least squares or some other 
method— most commonly by the method of least squares— using 
relatively short periods of time, say 9 to 19 yearn. The cj^clical 
movements then were the measures of the movements of the 
data above and below the empirical trend. Prof. Willford I 
King 'said, “Any particular t 5 ’-pe of fluctuation in which we 
happen to be interested can be successfully studied only when 
most of the other kinds of fluctuations have been eliminated.”^ 

This is, of course, the raison d’etre for the empiiical trend 
analysis, Avhich is primarily for the purpose of isolating the ordi- 
nary and the minor cycles. The major cycles or secondary 
secular movements are best studied by the Kuznets methods that 
have been described and illustrated. The methods of analysis 
used are essentially similar to those employed in empirical trend 
analysis, but the Kuznets logistic trend lines may be rationalized 
in terms of a law of organic grorvth. 

The second method of technique that developed was the 
attempt to apply harmonic analysis or the periodogram to series 
of economic data, a different application of the method of least 
squares. This was the work of Henry L. Moore of Columl)ia 
University in his application of Fourier’s theoj’em, the mathe- 
matics of which Fourier had developed a century ago in his 
Theorie des mouvements de la chaleur dans les corps solides and 
for Avhich he rvas feted by the Academie des )Sciences in 1812. 

Prof. Moore applied the mathematics of the periodogram to the 
records of rainfall in the corn belt of the United States, rvorking 
out the periodogram equations for the cycles of rainfall; he 
discovered similar cycles in crops and introduced the harmonic 
analysis into modern statistical method. He says;'-* “The prin- 
cipal contribvrtion of this essa 3 '' is the discover}'' of the larv and 
cause of economic cycles. The rhythm in the actmty of eco- 
nomic life, the alternation of buoyant, purposeful e.xpansion 
Avith aimless depression, is caused by the rhythm in the yield 
per acre of the crops ; Avhile the rhythm irr the production of the 

’■Journal of the American Slalistical Association, Vol. 19 (1924), p. 468. 

^ Economic Cycles: Their Law and Cause (1914). Cf. C'EUir, W. L., 
“Periodogram Analysis,” Chap. XI in H. L. Reitz, Handlwoh of Malhe- 
mulical’ Statistics (1924). Also Brunt, D.a.vid, The Combination of Observa- 
tions (1931), Chaps. XI and XII. 



Siam Of DYVAMic viRiAninn 


5()2 

ciops IS in turn caused by the rhythm of changing weather 
which IS represented by the cychcal changes m the amount of 
rainfall The law of the cycles of rainfall is the law of the cycles 
of the crops and the law of economic cycles 

The mathematics of the harmonic analysis are somewhat com 
plox, and this method has not attained the popularity that 
h'vs been attached to the removal of cmpincal trend bj using 
straight lines or second or third degree polynomials, where the 
mathematical analj sis involved is qmte simple 

Use of Functions of Axe Tangent and Orthogonal Polynomials 
m Trend Analysis In recent years two modified forms of the 
con\ entional tiend analysis by the method of least squares ha\ c 
been dc\ eloped In 1928 it was suggested that the inverse 
trigonometric function or arc tangent could be adapted to 
measuring trends m senes behaving m the follomng manner ‘ 

1 ^ downward tendency approumatmg a straight line but 
of biich nature that piojcction of a straight lino into the future 
^sould lead to absurd icsults that is negative or ndiculouslj 
small positive values when comparatively large positive values 
onI> aie possible 

2 Approximately a linear growth or decline followed by an 
abrupt change in level (rise or drop) and subsequent lesumption 
of the early tendency 

3 AppioMmately a stiaight-line tiend interrupted by a sharp 
n&e or diop followed by anothei abrupt change in level and 
subsequent resumption of the early movement at the same or a 
diffeient level 

The method was used successfully in fitting a trend to the 
annu il prices of International Paper common stock for the penod 
1909-1926 and to the annual index of wholesale prices m the 
United States 1900-1928 

The ortliogonal analysis is a method invented for reducing the 
amount of anthmetieal calculation involved in fitting polj 
nomials to time senes by the method of least squares especially 
second and tlurd degree polynomials or polynomials of higher 
dcgiee The fitting of a polynomial of higher than second degree 
to a time serios involves laborious calculations paiticularly if a 
considerable penod of time is covered Tins laborsaving method 

‘CiRMiciiAEL F L Hio Vre Tangent in Tieiid Determination 
Journal of the U er can Slalisl eal Istac aUan \ ol 23 (1928) pp 253 202 



rational basis of the analysis of time series 563 


is described in detail, together with tables of values to facilitate 
its use, in Chap. XXU.^ 

igee pp.' 600-615. Also c/. Johdax, Chahles, ■‘Approximation and 
Graduation According to the Principle of Least Squares by Orthogonal 
Polynomials,” The An7ials of Mathematical Slalistics,Nci\.Z (19321, pp 257- 
357. Of. Rojlaxovsky, V., “Kote on Orthogonalizing Series of Functions 
and Interpolation,” Bioinelrika, Vol. 19 (1927), pp. 93-99; Jokdax, Ch.uile=, 
“Sur une serie de-polynomes dont chaque somme partielle represente la 
meilleure approximation d’un degre donne suivant la methode les moindres 
carres,” Proceedings of the London Mathematical Society, Vol. 20 (1921), pp. 
297-325; and Dieulef.ait, C.arlos E., “La determinacidn de la tendencia 
secular en las series econdmicas,” Gabinete de Estadtitica, Rosario, Argentine 
Republic (Santa Fe), Universidad Nacional del Litoral (1932), pp. 1-52. Cf. 
Fischer, R. A., Statistical Methods for Research Workers (4th ed., 1932). 
pp. 133—142. 



CIIAPTni \XI 
TREND ANALYSIS 

Empirical Trend vs Rational Ireiul Both empincal ami 
rational trends are obtained by analjfcis from raw data, the 
difference between the tv\o is that a rational trend can be 
explained in terms of long-time growth or dtcUne, whereas an 
empirical trend has no mcamng per sc The empincal trend is 
a useful tool of anal>sis, as will be seen m thc«nsumg discussion 

In the preceding chapter the attempt was made to con\e> the 
idea that a rational trend is one that is found for its own sake, 
it lias a rational explanation and is useful as a method of inter 
prctation m itself \VhiIc it ma> be true that the lationaliza- 
tion that 18 made with respect to such trends is prcliminaiy or 
even tentatne, nevertheless the onginal intent is to make a 
rational use of them Empincal trends arc those for which thcio 
13 admittedly no rational basis at the start, being used mcrcl> os 
a consement method of removing from the data longer time 
ino\ cments that obscure the shorter time cyclical fluctuations 

Lmpincal trends m thcmschcs ma^ have no rational sig 
mhcancc as a descnption of any type of long-time growth, or 
movement An empincal trend calculated for a penod of 9 jean 
at a point m time coincident with the peak of a secondary secular 
mox ement w oiild presumably be in the form of a parabola jVt 
another point in time, a 9-jear trend analj sis may gi\ e a straight 
line, or a logarithmic line If a trend line happened to be cal- 
culated for a period of time from the low point of a sccondarj 
secular movement to the high jjomt of another, the empincal 
trend might assume the form of a Verhulst growth curve, but 
it may hav e no such significance as a law of grow th m that case, 
being simply the result of happening to take an empincal trend 
for that penod of time An examination of the lieavy black 
curve representing the secondary trends m cement production 
(Fig 141, page 550) will help to make clear what is meant by thesc 
statements 



TltEND ANALYSIS 


5G5 


Detecting Cycle by Removing Empirical Trend. While empirical 
trends may have no rational significance per se, the fitting of 
an empirical trend to the annual data of a time series \rill make 
it possible to isolate the residuals from the trend. These 
residuals constitute the C 3 'cles and minor cj’^cles of the period 
anal 3 ''zed. The first clear statement of the analysis of time 
series b 3 '^ this method was made b 3 ’' W. AI. Persons in 1915.^ 
The method is illustrated by examples at the end of this chapter. 

Thus the function of empirical trend anal 3 "sis is to obtain an 
approximation to some longer term movement for the purpose of 
eliminating thLs in order to stud 3 ’^ shorter term movements of a 
cyclical or accidental chai'acter. The empirical trend may 
approximate a segment of a long-term c 3 ’'clical movement, or it 
ma 3 '^ approximate a portion of long-term growth in the seiies 
that might itself have i-ational explanation. WTiat the empirical 
trend measures depends upon the circumstances in each problem, 
and the discovery of the rational nature of an empirical trend 
depends upon a priori knowledge. 

Methods of Fitting Trend. Three methods of fitting trend to 
time series can be distinguished: (1) the method of least squares, 
(2) the method of selected points, and (3) the method of 
averages. 

Fitting a Trend Line by the Method of Least Squares. Figure 142 
represents a plane in which there are seven points, Pi, . . . , Pt- 
To simplify the arithmetic an uneven number of points is taken, 
and the middle point is selected for the location of the j/-axis.- 
Accordingly, t varies from —3 to +3. The coordinates of the 
points, as ma 3 ^ be observed from the figure, are as folio w.s: 

Pi(t = -3, yi) 1\{1 = -2, y.) P,(i - -1, y,) 

P,{t = 0, 7/i) P,{t = 1, 2 / 5 ) P,{t = 2, y,) P^{t = 3, 2 / 7 ) 

^ American Economic Review, December, 1916, pp. 739-769; Publications 
of the American Statistical Associatioii, June, 1917, pp. 603-642; Harvard 
Review of Economic Statistics, Preliminary Vol. 1 (1919). Cf. Mitcheli., 
AY. C., Business Cycles — The Problem Stated and Its Setting, pp. 200, 212-213, 
328-330. 

- For statistical pui'poses it is more convenient to take a more recent year 
ns the time origin than that of the birth of Christ. Thus, if a given set of 
data run from 1927, say, to 1937, it might be convenient to choose 1932 as 
the zero year. If 'this were done, then 1933 would be 1 = 1, 1935 would 
be t = 3, 1929 would be 1 = —3, etc. 



5C0 


STUDY OF DYXAMIC YAUIAlilLirY 


The corrc'pondiiig points on the htnight line to bo found, for 
cNumple, ijoint .1 in thg figure, wiy Ixj represented by the 
following coordinates* 

H = -3. y\) it = -2, yj) (/ = -I, y'd . 
(t“0, y.) a = 2, yj) (t = 3, yj) 

1 ho gtnenil forjii of the CKiuatiou for a straight hue in a field of 
cooniinatcs y and t is y *= o + b/, and for this line the equation 
H as follows 

}/ = a + bt (1) 

'the lino is detormuKKl for tho particular case by finding values 
of a and b 



The line that is sought is the one from which the sum of the 
squared dcMotions of tho points from tho line is less than such a 
sum with xcspcct to any other line. This is the least-squaics 
cnterion. 

The \crtical residuals of particulai points fiom the line are 
ns follows, as illustrated in Fig 142 fur Ft 

ri = yj - y'x 

rt = Vi- y[ 

rt = y» — y$ (illustrated by Ft in Fig. 142) 
rr = r/T — J/J 

Some of these variations (designated as r) aro negative, for 
example, at Ft, while others are positive, a.s at Ft When 



TREND ANALYSIS 


5G7 


squared, however, they are all positive and the conditions that 
must be satisfied according to the least-squares criterion for a 
line that ■will best fit these points is that Sr" = minimum, in 
other words, that 

2 ( 2 / — y'Y — minimum (2) 

The value of y' , from Eq. (1), may be substituted; Eq. (2) then 
becomes 

S(y — a — hiy = minimum (3) 

The condition under which Eq. (3) is true is that the total 
differential is equal to zero, in other words, that 


d(Sr"-) 


da 


da -b 


a(Sr") 

db 


db — 0 


Inasmuch as da and db cannot be equal to zero, this gives the two 
conditions that* 


£(g:) = ^ ~ - btr = S2(y - a - bt) = 0 

= - a - bty = S 22 ( 2 / - a ^ = 0 


and hence the follo^ving two equations, by canceling out the 2’s 
and carrying out the summations: 


Sy = Na + bZt (i) 

Siy = aSt -b bSJ^ (ii) 

In these two equations, all the terms are kno^vn, except a and 
b; because Si = 0 and Sy is the sum of the known y’s of the 
seven points Pi, . . . , P 7 . The Si- is 

fi-bd-bl-bO-bl-b-l-bQ 


Because Si = 0, values for a and b can be found as follows : 

a = -^ from Eq. (i) 

b = from Eq. (ii) 

* In the case under consideration, it is not necessary to be concerned with 
the possibility that these same conditions might also hold true for a maxi- 
mum or a minimum, since the conditions of the problem indicate that it 
is a minimum. 



568 


STUDY 01 DYNAMIC VAIilABU IVY 


Accoi-dingly, the equation foi the line of best fit, by the 
cntenon of least squares is as follotvs 



Numerical Illustration Ab a more concrete illustration, 
\alues uill be assigned to the y’a of the seven points, as follows 
(i coordinates icmammg as befoie) 

Pi(y = S) P2iy = 2) 2My = 7) P.fy = 4) 

2^(y - 6) 7>s(y - 10) = 8) 

iVn orderly woik sheet will be set up in order to find i.j/ 
and N of course, is equal to 7 


Work Siiiir ton ^indino BBST>imiNO SintuiiT Jim fuk bLviv 
Oivfcv l*oivit 


> 


" 


-8 

a 


9 

-2 

2 


4 

-1 

7 


1 

0 

4 

0 1 

0 

1 

6 

G 

1 

2 

10 

20 ' 

4 

3 

8 

24 

9 



oO 


11 = 0 

Sy » 42 

-26 

2,1‘ 28 



Sly - 24 



The equation for the best^fiUmg line according to the least 
squares criterion is therefore as follows [see Eq (5)] 

y' = ^ + lit 

OE 

y' « 6 + 0 86t 

It will be well to note what the equation sajs Fust, with 
each unit ineiease of t the hne (that is, the value of y') rises by 
0 86 This value, 0 86, w called the "slope” of the line, and it 
IS the tangent of the angle that the line makes with the t axis 
or with any line parallel to the t-axis Second it says that, 
when t - 0, i/ - 6 This means that the line passes thiough 



TREND ANALYSIS 


569 


tlie ?/-axis at a point -{-6 from the i-axLs (when the y-axis is' 
located at the middle point in time). 

If the y-axis were shifted from its present location to the 
position i = — 3, everything else remaining in its original 
position, the value of the I coordinates of all the points P vdll 
change to accord with the new location of the y-a.xis. Also, it is 
to be noted that the above equation would then become 

y' = [6 - 3(0.86)] + 0.86t 
or 

if = 3.42 + 0.86t 

since 3.42 will be the intercept on the new y-axis. 



Fitting Second- or Third-degree Curves. Second-, third-, or 
even high-degree curves may similarly be fitted by the method 
of least squares. It may happen that the points are distributed 
in such a manner that a straight line does not fit. For example. 
Fig. 143 shoAvs seven points that Avould be better fitted by a 
parabola. The general form of the equation for such a curve is 

y' — Q, ht cl~ 

The equations for finding values of a, h, and c, for such a best- 
fitting parabola, are worked out on precisely the same principles 
as those for finding a and b for the best-fitting straight line.^ 
That is to say, the equation y' — a + bt + ct- is fitted to the 
points so that 

S(y — y')^ = minimum (6) 

and when the value of y' is substituted in this equation, it 
becomes 

1 For a better method of fitting polynomials by the method of least 
.squares, see Chap. XXII, Orthogonal Polynomial Trends. 



570 SrUDY OF Di N AMIC I ARlAJilLll K 

S(j/ — a — W — cf*)* = mmimum (7) 

\Vhen this expression is differentiated with respect to a, b, 
and c, following the same method as m Eqs (4), (i), and (n), 
the equations for finding a, 6, and c are obtained, as follows 

= Na + bZt (i) 

= aii + (n) 

« oSf* + bZt* + eZi* (ill) 

A work sheet such as the following form (leaving out columns for 
the uneven powers of they will presumably all be zero since 
the zero \alue of i is selected m the middle of an odd number of 
years) is used for finding values of a, b, and c 


Uonx SHfc*T FOH lINPI^O J)>3T FITTING I’aRASOL* FOB SbIBN CilVi If 

Points 


1 

V 


1 








m 

S( - 1 

Zy - 

Uy - 

Zt*y - 

U* - 

S1‘ - 


Since sa 0, when the sums of the columns in the work 
sheet are substituted m Eqs (i), (n), and (m) above, the thice 
unknowns a, b, and e may be found by solutions of these 

Probability 1 hcory Is NolApplicd It must beremembered that 
the application of the least-squares entenon for obtaining the 
line that best fits a time senes docs not involve the application 
of the theory of least squares in the sense that the trend lino 
obtained is a most probable line, expressive of a law of move- 
ment or growth in the probabihty sense ‘ originally applied, 
the theory of least squares had a definite connection with the 
theory of probabibties because it was devised as a method of 
obtaining a measure of the most probable orbit of a comet, etc 
In the fitting of a trend line to a single time senes there is no 
multiplicity of cases fluctuating in a normal distnbution about the 

^Cf Kuznets, Simov S, Secuiar MooemtnU in Production and Prices 
(1930), p 62,whocite8W H RLexiSiZurTkeortederMassenerscheinungentn 
dertnenschlichenGesellschaft{FreibiiTg,tB,F Wagner, Ed 1877), pp 31-33 
Sco also Tintner, Gerhard, “The Analysis of Economic Tunc Senes,” 
Journal of the American SlaltHtcal Asaoetation, Vo\ 35 (1940), pp 93-100 





TREND AN^iLYSIS 


571 


trend line. The use of the least-squares criterion in trend fitting 
for time series is merely the application by analogy of a method 
that pi’oduces desired results; it gives an objective criterion for 
finding the fine of best fit. If the analyst can be satisfied with 
a less objective method, he may use, for example, the method of 
selected points, which 'will now be described. 

Methods of Selected Points. One of the simplest methods 
of determining the trend of a time series is to make the trend 
“line” pass through certain points selected as representative of 
normal values. This line^ may be dranm in a purely freehand 
fashion, or a mathematical equation may be determined such 
that it is satisfied by the coordinates of the selected points. 

To determine a unique mathematical equation in a given 
case the number of selected points must be taken equal to the 
number of parameters in the equation. Thus, if a straight-line 
trend seems appropriate, two normal years are selected (pref- 
erably near the ends of the series) and the values of a and b in 
the equation y' = a + bt are so determined that the equation 
is satisfied by the values of t and y for the selected points. If 
a parabolic trend of the type y' = a + bt + cf^ is deemed 
appropriate, then three normal points must be selected to 
determine the values of a, b, and c. In general, if a polynomial 
of the 7 ith degree is taken to poi-tray the course of the trend, 
viz., y' = a + bt + 01"^ • • • -h ki’\ then there must be n 

selected points. The polynomial is the simplest type of mathe- 
matical equation to employ for this purpose. Other, more 
“rational” types may also be fitted by this method, however, 
and its use in fitting a simple logistic curve is described below. 

The actual process of finding the mathematical equation of 
the chosen type that is satisfied by the selected points consists 
in solving n simultaneous equations, n being the number of 
selected points (or the number of parameters to be determined). 
Thus if {k,y^ and (<2,2/2) are the coordinates of the selected 
points, the straight line y' = a + bt passing through these 
points is given by the solution of the following equations for 
a and 6 : 

2/1 = n + i><i 
2/2 = 0 + bti 

* “Line” is here used in the generic sense; it may be either straight or 
curved. 



572 


i>TUD} Of Z» \ IW/C I IHlllilUT) 


tor e\nnjplt, if the time bcdk uj ‘>uch that - 3 and = 9 
and if the y values for the^c vcars (or months) are* yi = 6S and 
yt = 1 10, then a and 6 arc found solving the equations 
68 == o +35 
no » a + 96 

Thc«e jicld a = 47 and 5 = 7, htnee tin. cqviation for tin, given 
trend IS y' = 47 + 7l 

If the equation to be liticd is v ‘‘ccond-degrte parabola 
y = a + it + c<* ind if (b yi), (fjy*), and (fi yi) are the 
coordinates of the hclectcd points, then a, b, and c are determined 
bj '5olv^ng the equations 

y» = o + 5li + ct^ 

yj => o + 6/* + c/j 

y, a a + 6fi + et; 

Three equations aic more difficult to solve than two but if the 
time scale ts cho«cn so that » 0 then these reduce to 

yi “ a 

yt =» a + bit + cfl 
yj ** rt + 5<i + cll 
or 

yt yi - bit + cll 
yi - yi = bit + ftj 

and two equations aie obtained for dctcrimnmg b and c, the 
value of a being yi For example if the ‘>electcd points arc 
iU = 0 y, = 68) (f, = 6 yi = 110) (f, = 12 y, = 200) 

then a =: 68 and b and c mav be found from the bolution of the 
equations 

110 - 68 « 61) + 36c 
200 - 68 = J25 + 144c 

The resufta aie 6 = 3 and c = i = 067, htnee the parabofi 
which passes thiough the given ixnnts is 

y = 68 + 3t + 0 07t* origin at ti = 0 

When higher dcgiet pol> 0 omi'iti aie fitted in this way, the 
Simultaneous eciuations maj be solved b> lepcatcd substitution 
' The c V il u-a ma> be 'ictiial vilucs or values estimated as i ormnl 



'J'liliXD AiV.lLYSIX 


573 


or special methods making use of finite differences may be 
employed.^ 

Method of Averages. Even less refined methods of fitting 
lines to data than those already described could be applied; in 
fact, the analyst could, if he so desired, merely draw the line 
that seems to fit the plotted data. The objection to this method 
is that it is too subjective — ^no two people would draw the same 
line. A certain degree of objectivity is secured by applying the 
method of selected points, which has already been described, or 
by using a modification of that method, namely, the method of 
averages. The method of averages merely suggests a refine- 
ment in the selection of the points. It can be illustrated b}" the 
fitting of a straight line, but it could be applied to cuiwes as well. 

Work Sheet for FiT-riNO a Straight-line Trend by the Method op 

Averages 

« y 

1 5^ 

2 2/ <1 = 3, ~ o 

3 7 ; For < = 3, y is taken as the average of the first five 

4 4\ y’s; that is, = 5. 

5 7/ 

6 8 \ 

7 15/ For I — 8, y is taken as the average of the last five 

8 19 > y’s] that is, — 15. 

9 18] /, = 8, I/, = 15 

10 15/ 

The trend line is the straight line passing through the two 
points t — 3, y' = 5 and t = 8 , y' = 15. Following the same 
procedure as that used in the method of selected points, the 
parameters a and b are found by solving the following two 
equations: 

5 = a + 3b 
15 = a + 8b 

from which it is found that 6 = 2 and a - —1, so that the 
trend line is y' = — 1 -f 2t. 

Method of Moving Averages. Ordinarily the method of moving 
averages is used with monthly data, but it could be used with 
annual data if an appropriate number of years over which to 

^ For the latter, the reader is referred to E. T. Whittaker and G. Robinson, 
The Calculus of Ohservalions (1924), Chap. I. 



574 


S2UDY OF DYNAMIC YAItlADlLllY 


average or smooth the data could be determined The difficultj 
of determining the proper number of years for the averaging 
jjenod IS one of the objections to this method, another objection 
IS that It does not give an equation of trend The method of 
moling averages is explained mChap XXIII, Seasonal Variation 

Advantages of the Method of Least Squares The advantage 
of using the least>-squares line is that it gives a line from which the 
residuals add up to zero and when squared are a minimum, this 
supphes an objective entenon to the lit of the line In addition, 
the least-squares method of trend htting is a very flexible device 
that can be mdely applied and vaned according to the type 
of line desired If a complex trend line is desired, a mathe- 
matical piocedure based upon the least-square entenon is 
handily available The method of orthogonal polynomials 
explained in the next chapter, for example is an application of 
the method of least squares 

ILLUSTRATIONS OF RATIONAL TRENDS 

Vs indicated in the preceding chapter, rational trends are 
likely to be logistic m character The simplest tjpe of logistic 
curve IS of the form y = ah', which may readily be reduced to a 
strai^t line if the equation is expressed in loganthms as follows 
log 2/ = log a -h 1 log 5 

Trend of a Dying Institution If the early development, 
growdh, and arnval at matunty of a new economic institution 
follow the pattern suggested by Raymond B Prescott, as 
explamed m the preceding chapter, presumably the disappeai 
ance of a dy'ing institution would follow a reversal of that pattern 
Thus, it would die slowly at first, then rapidly, and then slowly 
again until it finally disappeared If such is the case, the 
appropriate equation to use is one of the Yerhulst, Pearl-Reed, 
or Gompertz types of curves However, an economic institution 
that is disappeanng from the economic system might depart m 
another manner, it mi^t be stnick a sudden devastating blow 
by a new development that caused it to die or decline according 
to the simple logistic curve y *= ah* Such appears to be the 
case with respect to a certain type of commercial bank credit 
known as “open-market commercial paper” Many author- 
ities on money and credit beheve this to be a dying institution 
in this country, and accordin^y the dovmwatd trend illv^trated 



TREND ANjiLYSIS 


575 


iu Table 81 and Fig. 144 may. be considered a rational trend. ^ 
The data used are annual average monthly volumes of open- 
market commercial paper outstanding; and Table 81 is a work 
sheet for calculating the straight-line logarithmic trend line for 
these data, follomng the method indicated on pages 566 to 568. 
Here, however, the straight line is fitted to the logarithms of the 
data instead of to the data themselves. 

The equation for this trend line is y' = ah\ so that, by the 
rule of logarithms, 

log y' = log a -f- !! log b 

The two least-squares equations that would be obtained by the 
method e.\plained above are as follows:^ 

S log y ~ N log a -}- log 5Si 
Si log y = log aSi -f log 6 Si® 

Upon substituting the sums taken from the appropriate columns 
of Table 81, this gives 

36.18035 = 23 log a 
-38.41844 = 1,012 log b 

from which 

log a = 1.57306 and log b = -0.037963 

Therefore, the equation of the best-fitting (according to the 
least-squares criterion) logarithmic trend in this case is 

log y' = 1.57306 - 0.037963i 

When a logarithmic straight line is fitted to a time series by 
the method .of least squares, it is the sum of the squares of the 
ratio residuals that is made a minimum — and not the sum of the 
squares of the actual residuals as is the case where an arith- 

For explanations of the demise of open-market commercial paper see 
B.-H. Beckhart, The New York Money Market, Vol. 3, pp. 242-246; 0. A. 
Greef, The Commercial Paper House in the United States (1938), pp. 123-127; 
P. Hunt, Portfolio Policies of Banks in the United States 1920—1929 (1940), 
pp. 11-38. 

- See pp. 566-567. 



576 


iiTUDl OF DY\’AMIC VAHlABlUn 


Iable 81 — Work Sheet for CAEcviAn^kD \nne<.e Inulx op Normal 
AND Trend 

SlraightAine lofortl/imu: trend 

Data Open marLet commereiftl paper outstanding lunual a\erage8 of 
tQontlily data 
(In miUtons of dollars) 

Lquatiou of trend log y' 1 57306 — 0 O37063t 


^car 

Raw 

data 

log V 

• 

1 

t log .r 1 


Trend 

V’ 

Index of 
computed 

7 




-13 










-12 







1919 

1,08J 

2 03o03 

-11 

-22 38533. 
-20 4C500 

121.1 

99065 

979 

110 7 

1920 

l.lli 

2 046o< 

-10 

1001 

95269 

S97 

124.1 

1921 

741 

1 87148 

-9 

-10 87032 

811 

91473 

822 

91 1 

1922 

768 

1 88536 

-8 

-15 08288 

64jl 

87676 

753 

102 0 

1923 

83^ 

1 92117 

-7 

-13 44819 

49,1 

838SC 

690 

120 9 

1921 

87J 

1 ^101 

-6 

-n oico« 

361 

80081 

632 

138 1 

1923 

74J 

i 87099 

-5 

-9 35493 

2511 

762S8 

579 

123 3 

1926 

629 

1 79865 

-•* 

-7 19460| 

161 

72491 

531 

118 4 

1927 

3S3 

1 78710 

-a 

-5 30148 

¥ 

08695 

486 

120 4 

1923 

49-f 

1 69379 

-2 

-3 38746, 

41 

64899 

446 

no S 

1929 

322 

1 50786 

-1 

-1 50786 

V 

61102 

408 

78 9 

1930 

43^ 

1 08931 

0 

0 


5730e 

374 

130 7 

1931 

264 

1 421CC 

I 

1 42160, 


53510 

343 

77 0 

1932 

100 

1 02531 

2 

2 05062 


49713 

314 

33 8 

1933 

93 

0 97772 

3 

2 93316 

91 

45917 

288 

33 0 

1931 

156 

1 19312 

1 

4 77248 

16 1 

42121 

264 

59 1 

1935 

174 

1 24055 

5 

6 20275 

25 1 

3S324 

242 

71 9 

1936 

188 

1 27416 

e 

7 &1496 

361 

34528 

222 

SJ 7 

1937 

296 

1 47129 

7 

10 29903 

49! 1 

30732 

203 

145 8 

1938 

239 

1 37S40 
1 29067 

S 

11 0272(1 

61 


26936 

186 

128 5 

1939 

198 

9 

11 67003 

81 


23139 

170 

116 5 

1940 

234 

1 30922 

10 

13 69220 

IOC 


19349 

156 

150 0 

1941 

317 

1 50106 

11 

16 51166| 

121 


15547 

143 

221 7 




12 










13 









36 18035 


-38 41844 

1.012 



2,496 4 

.V = 23 


Slogy 


Sllogjf 

21* 



Sy = y' 


Source Compiled from the louuid Report of tlie Federal Reser>e Board 1929, p 121. 
1933. p 174, and from the Surrei/ e] CmrreiU Bitnnttt Innual Suppleraent (t ol 20. 1940), 


THEND ANALYHIS 


577 


metical straight line is fitted, It is the foUomng expression 
that is minimized: 

S(log y - log y')'^ 

which is the same as 

If the logarithm is expanded in a power series, this sum is seen 
to be roughly equivalent to 




Pio. 1-14. — Open-market commercial paper outstanding in the United States, 
1919-1941. Logistic trend fitted by method of least squares. 

For a dying institution, open-market commercial paper out- 
standing showed remarkable vigor in the years 1933-1941, and 
perhaps the monetary economists were premature in their 
predictions. Whether or not they were is a matter for the 
future to reveal. 

1 Cf. pp. 566-567. 



Of mvi^nc \ iK/iu/nn 


Trend of a Growing Institution Method of Selected Pomta 
Illustrated If the hypothc&is made bj Rajinoud B Prescott 
can be demonstrated or iUus,trited m real life, it should surclj be 
done b} the dc%cIopmeot of the automobile during the past 
three or lour decades Table 82 and Fig 245 giic an illustration 
of the fitting of a rational trend tliat purports to represent thus 
tjpo of growth, constituting thereby a test of this hjpothcsis * 
The> also illustrate the mtthod of fitting a logistic cur\c of the 
Pearl Reed type by using selected points 

The equation of the curve may be nntteii in the form 


in which wi = 

It IS thus required to find thiee parameters a, and b, which 
IS more convemently done bj first converting the equation into 
logarithms, os indicated in the work sheet 
By using annual data, consisting of monthly average output 
of passenger cars and trucks each year from 1903 to 1941, a 
graph was made and from its examination the following selected 
points were adopted 


b 


1909 1622 


-0 < 1-13 

= 10 y = 2o0 


<1 

yt 


1633 


= 20 
= 320 


The \ alucs of the parameters k a, and b may be found by using 
the following equations * 

I = ~ yUvo -f Vi) 

yoVi - y* 

> ('•<) 

« — W 

m which n is defined as /» — b 


* Explained on pp 5o3-o55 

*C/ Pearl, Raymond Sltidtes tn Human Otology, (1924), Chap \X1V 
The Biology oj Population Growth (192a), p 22 Citations used (tora 
F h CroxtOR And D J Cowden AppUed General Statistics (19Z9) pp 444- 
44 > 8.)2 853 



TREND ANALYSIS 


579 


Thus, for' the problem illustrated, 

2 X 10 X 250 X 320 - 250 X 250 X 330 

10 X 320 - 250 X 250 

= 5(128 - 1,650) ^ :::7,610 

M - 25 -23.72 

= 320.82630 

, 320.82630 - 10 

a = log. - - 

= log. 31.082630 
= 2.302585 logio 31.082630 
= 2.302585(1.4925178) 

= 3.4366491 

1 , 10(320.82630 - 250) _ 2.302585 70.82630 

13 250(320.82630 - 10) 13 7,770.6575 

= 0.1771219 (logio 0.00911458) 

[logio 0.00911458 = 7.9597368 - 10] 
= 0.1771219 (-2.0402632) 

= -0.3613753 

As indicated in Table 82, the values for m for various values of 
( are conveniently found by the use of logarithms; thus, since 

m = 6“+“ 

log m = logic e(o + b() [since logic e = 0.43429] 
= 0.43429(a + b() 

or, for the- example illustrated, 

log m = 0.43429(3.4366491 - 0.36137530 
= 1.4925023 - 0.1569417i 

For the year 1909, when t = 0, the value of log m is 1.4925023, 
as may be seen from the work sheet (Table 82), and the values 
of log m for other values of t are obtained by the successive 
algebraic subtraction of the constant —0.1569417 through the 
years preceding 1909 and by successive algebraic addition of the 
constant —0.1569417 through the years subsequent to 1909. 
These are the logarithms of m for the various values of t, that 
is, for the various years. In the next column of the work sheet, 
the antilogarithms are entered, which, when added to 1, are 
divided into the constant k in order to find the trend values for 
each year- These steps are shown in the next three columns of 
Table 82. An index of normal, that is, y/y', is also calculated. 



S80 -ST[;i>r OF UYSAMIC VAHlABILn'i 

Table 82 — Work Shlet for CAumuTiso Index op Normal wd Trlno 
L oj^isivc Iretid 0 / Ptarl^Rted type 

D\ia Automobile production in the United States Annual averages of 

mODthljr data 
(In thousands of cars) 






tear < 

lug m 



!L 



^ rr^ 


“ 

1903 -6 . 

2 4341325 

271 7393 272 7393 1 176 

0 9 

76 u 

1904 -5 

2 2772108 

189 3261 190 3261 1 686 

1 9 

112 7 

190S -4 

2 1202691 

1 131 0073 132 9073 2 414 

2 1 

87 0 

1906 -3 

1 9633274 

91 90234 92 90234 3 453 

2 8 

81 I 

1907 -2 

1 8063857 

64 O3O30 65 0303 4 933 

3 7 

75 0 

1908 -1 

1 6494440 

44 61122 45 01122 7 034 

5 4 

75 8 

1909 0 

1 4920023 

1 31 08152 32 08182 10 000 

! 10 9 

1 100 0 

1910 1 

1 33o5600 

' 21 65515 22 GaSIS 14 161 

15 6 

' 110 1 

1911 2 

1 1780189 

15 08757 16 0876 19 942 

17 5 

87 8 

1912 3 

1 021C773 

10 0 II 77 11 5118 27 8C9 

31 3 

113 0 

1913 4 

0 8647333 

7 32380 8 3238 38 543 

40 4 

104 8 

1914 0 

0 7077038 

5 10263 6 1026 oi 572 

1 47 4 

90 2 

1913 3 

0 3u08521 

3 559 IO 4 5m1 70 432 

. 60 8 

114 7 

1916 7 

0 3939104 

2 47691 3 4769 92 273 

, 134 8 

, 146 1 

1917 . 8 

0 23G9687 

1 72571 2 7257 117 704 

156 2 

132 7 

lots 1 9 

0 0800270 

1 20234 2 2023 143 673 

' 97 6 

67 0 

1919 10 

-0 0709 U7 

0 83769 1 8377 174 581 

161 1 

92 8 

1920 1 n 

1 -0 S3383&4 

0 58364 1 5836 202 .>88 

1 185 6 

1 91 6 

1921 12 

! -0 3907981 

0 40663 1 4066 228 082 

134 7 

1 59 0 

1922 13 

: -0 3477398 

0 2833S I 2833 249 999 

' 212 0 

84 8 

1923 14 

-0 7040816 

0 19739 1 1974 267 938 

336 2 

12o u 

1924 IS 

1 -0 8016232 

0 13752 1 1378 282 040 

1 300 2 

106 4 

1923 16 

' -1 0186049 

0 095815 1 0938 293 773 

3o3 0 

121 4 

1926 17 

-1 1736006 

0 066756 1 0663 300 748 

1 358 4 

119 2 

1927 18 

-1 3324483 

0 046»il 1 0465 306 568 

; 283 4 

92 4 

1928 19 

' -1 4S93000 

0 032405 1 0324 3(0 7u8 

363 2 . 

Ilb 9 

1929 20 

-1 6463317 

0 022577 1 0226 313 742 

446 5 

142 3 

1930 21 

-1 8U32734 

0 015730 1 0157 315 858 

279 7 

88 6 

1931 22 

. -1 9002151 

0 010959 1 0110 317 318 

199 1 1 

62 7 

1932 23 

: -2 1171568 

0 007636 1 0076 318 394 

111 2 j 

35 9 

1933 24 

-2 2740983 

0 <M)a31»9 i 0053 319 128 

' 160 0 

uO 1 

1934 23 

-2 4310402 

0 0037065 I 0037 319 640 

229 4 j 

71 8 

1935 ' 26 

-a 5879819 

0 0(05824 1 OOZoS 320 001 

1 328 9 1 

1 102 8 

1936 27 

-2 7449236 

0 0017992 1 00180 320 2o0 

371 2 i 

115 9 

1937 28 

-2 g0186o3 

0 0012535 1 D012S320 426 

400 7 

123 0 

1938 29 

-3 0388070 

0 0008734 I 00087 320 547 

207 4 

64 7 

1939 30 

-3 2U74a7 

0 IKI06085 1 00061020 631 

1 25)8 1 . 

93 Q 

1940 31 

-3 3726904 

0 0UO4Z39 1 00042 320 693 

372 4 

116 1 

1941 32 

-3 5296321 

0 00029a4 1 00030 320 730 

1 403 2 

12o 7 

1 




2) T - 3 T88 71 

bourte Data froDi AhUrad of Ih* Uni/nf 5(ale<, 1 

1933 p 

334 and Aurcep r/ 

CUTTK 1 llanouB Annual bupcWutent VU 12 U»82), anUnir 

Li\t ismie 

a 

• * - 320 


579 



t it the c 

urre had been 

htted accordiBs to the leaat-equarc: 

s enteric 

.n line aum would 

approximate 

) a hundred tim 

ea the Runbor of yoam that b 3 900 




TREND AN,ILYSIS 


581 


The results lend support to the hypothesis that automobile 
production in the United States had a growth during those years 
following the law of the Pearl-Reed logistic curve. The goodness 



Fio. 145. — Automobile pioduction in the United States, 1903-1941. Pearl-Reed 
curve fitted by method of selected points. 

of fit of the trend is attested to, not only by the plotting of the 
curve ivith the data in Fig. 145, but also by the fact that the 
sum of the ratios of the raw data to the trend equals approxi- 
mately a hundred times the number of years. 



582 


^rl/Dl OF DiJtfAMIC 


ILLUSTRATIONS OF EMPIRICAL TRENDS 
Tl*c rlutinctiou between raUoonl trends nnd empirical trends 
lies, not in the method of calculation, but in the interpretation 
and analytical use of the trend after it is calculated Vet, in the 
case of cmpmcal trends, it frcquentli sufhcca to fit a trend line 
of /ery simple character Thus a straight hne may be quite 
adequate m some cases 

Slratght-hne Trend Table 83 contains a work sheet for 
calculating a straight-line trend m open-market commercial 
paper outstanding for the period 1931-1041 nationalization 
of this trend is uncertain— it may be the commencement of a new 
penod of growth m what was supposed to be a djing institution, 
Table S3 — Wore Shllt ron Calcclatino 1m>ex or Norual a\i> Trcsu 
8(rai(rJU line 

Data Opea market coiuntcexial paper outstanding Annual B^e^lgc^ of 
monthly data 
(In nulUons of dollars) 

Equation of trend y ~ 200 + 12 40< (origin at 1036) 

SocRCL v^nual Report of the Federal Reserve Board 1920, p 121, 
1935, p 174 Survey of Current Business Annual Supplement Vol 20 
(1940), p 47 and ewnent issues passim 


— m — 

(h 

131 

— 15 1 


[35 

— m — 

Year 

Tlaw 

data 



(r 

Ttrnd 

Index of 
eomsiutcd 







? 

1931 

264 

-5 

2o 

-1 320 

144 

183 3 

1932 

*106 

-4 

16 

-424 

156 

07 9 

1033 

9o 

-3 

9 

-28a 

163 

56 5 

1934 

156 

-2 

4 

-312 

181 

86 2 

1935 

174 

-1 

1 

-174 

104 

89 7 

1936 

18S 

0 

0 

0 

206 

91 3 

1937 

296 

1 

1 

206 

218 

135 8 

1938 

239 

2 

4 

478 

231 

103 5 

1939 

198 

3 

9 

594 

243 

81 5 

1040 

234 

4 

16 

936 

2o6 


1941 

317 

o 

25 

1 585 

268 

118 3 


2,207 

0 

lio ' 

■cm 


1 105 4* 

W « 11 

Zy 

Jf 

a* 





* Thu total 14 A cicua on the wmkeheet Italiouldoe alaKunilrod tlrnoa tkenutnlior 

of yean. Failure to cheek preetecly is d » to rimndti g. 






TREND ANALYSIS 


583 


or it may be merely a cyclical movement. At any rate, for the 
period of 11 years selected the trend analysis makes possible a 
better study of the shorter term cyclical or residual movements in 
the data. 

The work sheet contains all the information necessar^^ to 
calculate the equation of trend, which in this case is of the simple 
form, y' = a -\rU. As seen in Eq. (5), the equation is found 
by the folloiving: 


N ^ ' 


Erom the work .sheet, for this particular problem, 

2^ = 2,267 2i! = 0 2T = HO 2i!y = 1,374 A = 11 


Accordingly, 


.-?^ = 206 


and the equation of trend is y' = 206 + 12.49i (origin at 1936). 
It is necessary to specify the origin in order to know for which 
year t = 0. If the origin were 1931, the equation would be 
y' = 144 + 12.49i (origin at 1931); this equation describes the 
same straight line as y' = 206 + 12.49i (origin at 1936). 

Column (6) of the work sheet contains the solutions of the 
trend equation for the respective values of t. Thus, for 1933, 
I = —3, and the solution of the trend equation for that year is 
y' = 206 + (-3) (12.49) = 168. 

Column (7) is the index of computed trend, each y of the raw 
data divided by the corresponding y' of the trend, and the result 
expressed as a percentage. Thus, 264 is 183.3 per cent of 144. 

Polynomial Empirical Trends. Lahorsamig Devices. It is 
possible to find a second-degree, third-degree, or higher degree 
polynomial trend by the methods already illustrated. To fit a 
second-degree polynomial, according to the least-squares criterion, 
the work sheet would be like that illustrated in skeleton form 
on page 570. But it is better, for practical use, to introduce two 
important sets of laborsaving devices before proceeding to fit 
the higher order polynomial trends. The first set of laborsaving 



584 


STUD\ Ot /)J V tU/C VXHIABIUI 1 


devices has to do «ith economy of calculation in the nork sheet, 
the second set has to do with solving the equations for different 
values of I, therefore, with computing trend a alues for vanous 
jears 

Economy of Calculation tn the Work Sheet As already noted, 
an economy uas obtained bj takmg an odd number of years and 
making the median year the ongin, so that = 0, 2.1^ = 0, and 
feimilarlj the total of all odd poners of I Mill be equal to zero, 
hence, columns in the work sheet for odd powcis of I are not 
required In addition, the entry of columns in the work sheet 
foi the even powers of t may be avoided because Si*, 2i*, li*, 
etc, can be computed from formulas It can be shown by 
algebraic derivation' that, if I runs integrally from t = ±1 to 
iV + I 

t ~ ±(n — 1), in which n » — ^ — » r e , n =* t (terminal value) 
+ 1 


n(n — 1)(2» — 1) 
‘ 3 

'3rt* - 3n - 


^ ^ j' 3»^(«’ - 2n) + 3n + 1 


( 10 ) 


By similar algebraic computation St* can be c\ aluated m terms 
of n, but it is preferable to use orthogonal polynomials if a trend 
equation of fouith or higher degree is sought 
A second economy for tlie work sheet is secured by using i 
subtotal summation procedure by which aggregates Si, Sj, Si 
Si, etc , are obtained From these a^regates algebraic formulas 
are used to compute as follows 

= Si 

Ky = nSi — Si 

Zf^y « n*Si - (2/1 + 1)S, + 2S, 

Zi^y = 7i*Si - (3n* + 3n -f 1)5* + G(n + 1)S, - 65* 

in which n — — ^ 



* Cf Ross Frank A , Formulae for Facilitating Computation m Time 
Senes Analysis, ' Journal of the American Slatislical A$sonation Vol 20 
(1925) pp 75-79 For method of proof sec footnote 1 p 580 



TKEiVD ANALYSIS 


585 


Table 84. — Eco.xojiical Work Sheet for CALcuL.iTi.vo PoLr.AoMtAL 
Tuexd, Algebraic lLLusTR.vnox 
Method of least squares 


Year 

Data 

P ts of subtotals 


r 

( 

V \ 

Flfbt ; 

Second 

1 

j Thira 

CD 

;(2) 

(3) 

(4) 

(5) 

1 (0) 

(7) 

1937 

1 

-2 

VI 

VI 

i 

Vi 

VI 

1938 

2 

-1 

ys 

yt -ryi 

+ yt 1 

3ki + K; 

1939 

3 

0 

Vi 

91 + 95 + v> 

3»l + 2k; + Kj 

Cj/i 4- 3vi *f )/j 

1040 

4 

1 

y* 

VI + V2 + Vj + Vi 1 

4vi 3yi + 2vj + V4 ‘ 

lOyv + 6y: + Syj -r u* 

1941 

5 

2 

V- 

VI + V2 + 3/3 + Vi V5 1 oyi -f 4yi -h 3v3 -f 2v» + v- 

loyi + lOyi 4- 6vj -f 3vi -r y-j 



Si 

s. 1 

1 

1 


s, 


The subtotal summation process is illustrated algebraically in 
Table 8-i and arithmetically in Table 85. The sum of column 
(4), Si, is merely 2?/. Column (5) contains the first set of svd)- 
totals, which is obtained on the adding machine by taldng a sub- 
total after entry of each item in column (4); the first subtotal 
in column (5) udU thus be the first item of column (4), therefore, 
iji, the second subtotal in column (5) will be j/i 4- tji, the third 
subtotal will be yi 4- 2/2 + ys, etc. S 2 is the sum of these 
subtotals. 

The second set of subtotals, column (6), consists of subtotals 
of the figures in the preceding column, column (5) ; thus the first 
subtotal in column (6) is yi, the second subtotal is 2 yi + yo, the 
third subtotal is 3yi 4- 2y2 4- Vs, etc. Ss is the sum of the 
second set of subtotals. 

The third set of subtotals, column (7), consists of the subtotals 
of figures in column (6) ; and Si is the sum of this third set of 
subtotals. 

This process of taking subtotals and aggregating the subtotal 
by columns to obtain S2, S3, Si can be repeated to as many' as 
desired, depending on how high degree a polynomial is to be 
fitted. If carried as far as Si, a third-degree polynomial can be 
fitted. 

A cross check on the work sheet is noted in Table 85: Si is 
equal to the final subtotal in column (5), S 2 is> equal to the final 
subtotal in column (6), S 3 is equal to the final subtotal in column 
(7), etc. 



58 G 


STUDY OF DYlfAUIC VARIABILITY 


Table 85 — Economical Work 6iilet fob Calcvlatinq Polynomial 
Trend, Aritiiubtical Illustration 
{Method oj U<ul squares) 


Year 


1 Beta of BubtotaU 


T 

' 

B 

Firat 

Second 

Thiixl 

(IJ 

W 

(8) 

1 1 

(S) 

(6) 

(7) 

1937 

1 

-2 

■■ 

2 

2 

2 

1938 

2 

-1 


7 

9 

11 

1939 

3 

0 


15 

24 

35 

1940 

^ i 

1 


22 


81 

1941 

5 1 

2 

BH 

31 1 

77 

158 




31 

77 

158 

287 




Si 

Si 

. 5, 

^4 


From Tabic 84 it can be readily seen that algebraically 

Si " J/i + Va + j/8 + + yn 

St - Nyi + - l)ya + (N - 2)ys 4- + j/s (12) 

„ N(N + 1) . (N- l)N . (N - 2)(N - 1) 

Sa g yi 4 g yt 4- — 2 Vs 

+ 4 - 

For the coefficient of vt m this sum le equal to tiio sum of tho natural 
numbers from 1 to W, therefore, ^ J, which equals the coeffi- 

cient of yi 18 tho sum of the natural numbers from 1 to W - 1, which equals 
etc 


„ NiN + l)(.N+2) . {N-l)m{N + l).. , 

04 =* Q yi 4 Q !/a T 


> This may bo demonstrated os follows 
ST = 14-2-1-3+4-1- +N 
Also, - 

ST - W + (^ - 1) + (AT - 2) + (W - 3) + +1 

By adding, 

2sr - (W + 1) + (AT + 1) + (ly + 1) + (Af + 1) + 

- jv(Ar + 1) 

and hence 


+ (iV+l) 







TREND ANALYSIS 


58/ 


For the coefficient of yi is the sum of — as N goes from 1 to N- 
+ 1){N +2) . 

this sum equals g: ; etc. 


Ifc tvill be convenient to express these sums in the following 
manner: 


51 = J^y 

5 2 = 2(iV + 1 - T)y 

In the case oi yi, N + 1 — T = N. In the case of y- 
iV + 1 - r = iV - 1. 

In the case ot yz, N + I — T = N — 2. Etc. 

^ -V (N + 1~ T)iN + 2-T) 

S 3 = 2 ^ 2 y 

(N + 1 - T)(N +2-3’) NiN + 1) 
2 ~ 2 ' 


In the case of yi, 


In I 


the case of yz, 


(N + l - T){N +2-3’) __ (iV - DiY 


Etc. 


Si 




(iV + 1 - T)(N + 2- T)(N + 3-7’) 

6 


y 


In the case of yi, 

(iV + 1 - 3’)(3V + 2 - T){N + 3-3’) N{N + l)(iV + 2) 
6 6 

In the case of yz, 

(iV + 1 - 3’)(3\f + 2 - 3')(iy + 3 - T) QV - \)N{N + 1) 


(13) 


+ 1 

In the above equations, if T is replaced by t + T = t ^ — 


and if, by definition, ti = 


N+ 1 


these equations become 


51 = 21 / 

52 = nXy - 'Lly 

2£!3 = S(n - tyy + nSy - Sfy ( ' 

6 S 4 = 2 (n - t^y + 32(71 - lYy + 2 n 2 ?/ - 22 ft / } 

in which the unmarked 2 refers to summations vnth respect to 
t from + ^ to - ^ Z -- If these equations are e.vpanded 
and similar terms assembled, Eqs. (11) are obtained. 



588 


Si Ot liYK Uf/C % AftI iCiUi i 


TaULL 8fi — ORK SlILI T FOR CaLCOLATJNO TnENl>— StCONl’-DLC RJ h 
POLTHOUIAL 
Method of leatl squares 

Data Consumer expenditures for personal appcaraiRt and (.oinfort 
\nnual data in millions of dollars 

SouiiC) OF Data iSurtej^ of CtureHi Business, October, 1942, p 24 



ByusingEqg (U),pa6c584 thofolIowingvaluLsaroobtained 


Sy = 0,785 

lly = 7(6,785) - 46,963 - 532 

*= 49(6,785) - 15(46,963) + 2(239,746) «= 107,512 

By using Eqa (10) the following valuer aie obtained 

,,, . wm . ,82 

Zt‘ =■ 182 - 182(25) - -1,550 

To find the sccond-degieo polynomial trend equation these valiiea 
may be substituted in Eqs (7), (i) to (m), page 570, as follows 

(i) 6,785 = 13a + 182c 

(ii) 532 = 1825 6 = 2 923 

(ill) 107,512 = 182« + 4,5S0c 
<>') 94,990 = 182 a + 2,548c (i) X 14 

12,522 = 2,002e 


c * b 2547 



trend analysis 


581* 


(iii) 107,512 = 182a + 4,o50c 

(i") 169,625 = 325a + 4,550c (i) X 25 

62,113 = 143a „ = 434 

Accordingly, the second-degree pol^aiomial equation of trend 
for the problem illustrated is 1 / = 434 + 2.Q2SI -f 6.2547(- 
(oiigin at 1935). 

Finding Trend Values by Method of Finite Differences. The 
equation for a trend line having been found, the problem is to 
compute from this equation the values of 1 / pertaining to a given 
set of 3^ears. Direct substitution is laborious. Finite differ- 
ences provide an easier method. The kej'stone of the latter 
method is the fact that the ath difference of a polynomial of the 
7ith degree is constant. Hence, that constant nth difference 
having been determined, the other differences, and ultimately 
the desired trend values themselves, can all be computed by 
merely reversing the differencing process, i.e., by simple addition. 

In the equation y' = a + bt + cl- the first difference, by 
definition, would be 

A'y' = a 1) + 4- 1)- — a — bt — ct- = h + 2c( + c 

and, by definition, the second difference would be 

A-y' = b + 2c(t + 1) + c — b — 2ct — c = 2c 


Table 87. — Builuing Up a Polyxo.mi.vl by Finite Diffeue.vce'' 


1 

(1) 

(2) 

(3) 

w 

(5) 

I (6) 

t 

Fourth j 

difference 

A'l/' 

Third 

difference 

AV 

Second 

difference 

I'lOt 

difference 

j Poli iioinijl 
! (trend) 

c jlue-* 

-4 

-1 

6 

-48 

220 

305 

-3 

-1 

5 

-42 

172 

525 

-2 

-1 

4 

-37 

130 

697 

-1 

-1 

3 

i -33 

93 

827 

0 

-1 

2 

1 -30 

60 

920 

1 


1 

-28 

30 

980 

2 



-27 

i 2 

I 1,010 

3 




i -25 ' 

1 1,012 

4 

... j 



! 1 

, . .. 1 

[ 987 










590 


STUDY OF DYNAMIC VAIilABILIlY 


Table 87 illu&tiatca the building up of a polynomial by finite 
differences The polynomial here w of the fourth degree, and 
hence its fourth differences are all identical 

A figure in a given line of column (6) algebraically added to 
the figure m the same line of column (5) gives the next figure for 
column (6), thus, 305 + 220 = 625, 525 + 172 = 607, etc 
Similaily, a figure m a given line of column (5) added algebraically 
to the figure m the same line of column (4) gives the next figure 
for column (5), thus, 220 - 48 = 172, 172 ~ 42 = 130, etc 
Iho same geneial rule applies to the figures in columns (2) 
and (3), thus, -48 + 6 « -42, -42 + 5 = -37, etc, and 
6 — 1 = 5, 5 — 1=4, etc 

In the polynomial illustrated m Tabic 87, the polynomial is 
known to have a constant fourth diffeiencc Hence, if the 
polynomial value and the differences of any one line are all 
known, then the differences and polynomial values for all other 
lines above or below the given line can bo readily computed 
Thus, if for t » 0, it IS knoim that the polynomial value y' = 920, 
= 60, A’yJ = —30, = 2, and the constant fourth dif- 

ference is equal to -I, then, by working from nght to left and 
up and down, the other values in the tables can bo built up 
Iho first set of variable diffcicnccs, m this case the third, can be 
built up cumulatively from the knowm = 2 and the con 
slant difference —I It is to be noted that in a downward 
direction, m this table, this constant difference is — 1, so m 
building up from the bottom to the top the constant difference is 
algebraically —(—1), or +l This rule follows also for the 
building up of the other differences 

Iaslf 88 — Ain fou Comfutiso I-inits Piffehlscss at I » 0 in 


Polynomial j/' = o + + 1<* + dl* + 







TREND ANALYSIS 


o9I 

This method is of general validity and can be u«ed to find 
values of a polynomial of any degree from knowledge of it-> 
value for one year and its differences for that same year. For- 
tunately, it is relatively easy to calculate the y' or polynomial 
value and the various differences for the year i = 0. If the 
form of the polynomial is y' ^ a U cl- + eD + ■ - • , 
then the polynomial value for t = 0 is y' = a. The first, .second, 
and higher order differences for f = 0 can be computed wth the 
help of Table 88. 

The figures in Table 88 give the weights by which the paiam- 
etei-s b, c, d, e, f, etc,, must be multiplied to give the diffeience 
specified at the top of each column, as follows: 

Aiy'o = 6 + c-bd-l-c-b • • • 

A^-y'o = 2c -b 6d -f 14c + 30/ -b • • • 

A^y'o = 6d -b 36e -f 150/ -b • • • 

A^y'o = 24e -b 240/ + • • • 

A Vo = 120/ -b • • • 

For a particular polynomial, each of these equations, of couise, 
terminates with the coefficient of Thus, for a second-degiee 
polynomial y' = a bt ct\ the formulas for the differences at 
i = 0 w'ould be A'yJ — b c and A-y'g ~ 2c, the higher differ- 
ences being zero since the second difference is the same for all 
values of t. For a third-degree polynomial, 

y' = a -b -b + dt^ 

the formulas would be A*yo = b + c -h d, A-y'g = 2c -b 6d, and 
A^y'g = 6d. For a fourth-degree polynomial 

y' = a -h bl -i- cl- + dt^ -b el* 
the differences would be A*y'g — b-{-c-\-d-i-e, 

AYo = 2c -b 6d + 14e, A^o = 6d 36c 
and A*yg = 24c. 

For higher degi’ee polynomials, the table can be readily 
extended by’^ the rule that a figure in a given line of a given column 
is equal to the number of the column multiplied by the sum of 
the two figures in the line above situated in the given column 



W UDl or W \ IVIC l AH! UHN 1 1 


'»')2 

and in the column to the left, rcspcctivelj For example, 
36 = 3(6 + 6), 24 « 4(0 + 0), etc * 

The use of finite differences to compute the trend values of a 
second degree iiolynomial is illustrated in Table 89 The trend 
IS y' = 434 + 2 923t + 6 2547t' (ongvn at 1935), calculated 
above m Table SO foi data on toiisumer expenditures for pci- 
sonal appearance and comfort m the United States, 1929-1941 
In Table 89 the constant second difference is known to bo 
12 509, the first difference for f = 0 is 9 2,* and the trend valm 
for f = 0 is y' = 434 0 lht‘« arc first entered in the work 

Tabi 1 80 — tv oiiK &IU »T »ou Coul UTixa Tin m> Valui s by Method of 
tixm DoFsnvsct"* 

I qualiotj of trend j » 431 + 2 9231 + 62 >i7t* 

Value of yl <- 434 

Value of av. - 2 923 + 6 2 47 - 9 1777 
Constant «12o00 

y* 

041 o 
i75 6 
o2i 2 

-3 I I -28 3 481 4 


193) 

-2 

-lo 8 

413 1 

1934 

-I 

-3 3 

437 3 

JOS'! 

0 

12 >00 9 2 

434 0 

1930 

1 

21 7 

443 2 

1937 

2 

34 2 

404 9 

t938 

3 

40 7 

499 1 

l')3'J 

4 

o9 2 

.41 8 

1940 

5 

71 7 

fOo 0 

1941 

0 


676 7 


sheet, tlien, since the constant second difference is positive, tlu 
lemamdtr of the column of fiist diffcit,ncts is obtained bj suc- 
ce&sivelj subtracting 12 509 to obtain firet diffeiciicts for cailtQi 
years and by successively adding 12 "KK) to obtain first differences 
for later jcai-s Obtaining the trend values is illustrated as 
follows 434 0 + 9 2 « 443 2, 443 2 + 21 7 = 404 9, etc , for 
values before 1935, 4340 - (-33) = 437 3, 437 3 - (-15 8) 

> C/ IViiiTTVKLR and ItOBiB&ov op rif,pp 1-7 

* Iho first d'fTcrcncca may 1 c roimdi 1 wiUiout cniismit oiinuiliitnc error 


Yi» 

1 

i’ll 

ill 

1929 

-0 


-Oo 9 

1930 

-a 


' -63 4 

1931 

1 ~4 

1 

1 -40 8 




TREND ANjILYSIS 


593 


= 453.1, etc. Beginning at the top of the table, it will then 
be found that 641.5 — 65.9 = 575.6, 575.6 — 53.4 = 522.2, etc. 

WhDe the explanation of the method of finite dift'erences may 
be extended, its use in solving second-degi-ee polynomials for 
various values of i is much more expeditious than the method of 
obtaining solutions to the equation for the various values of t by 
substitution in the equation. The labor involved in the longer 
method is great if the number of years is large or if the poly- 
nomial is of higher than a second degi'ee. In contrast, the 
method of finite differences may be used without difficulty, and 
the arithmetic involved is always simple addition or subtraction. 

A danger inheres in the use of finite differences, namely, that 
any error in the higher order differences is cumulated as the lower 
order differences are computed. Tor this reason, when the 
trend line is determined the coefficients of the higher powers of 
t should be carried out to a larger number of places than would 
be regarded as significant. If, for example, the coefficient of 
is rounded off to the fifth place, the maximiun error in the fourth 
difference is 24 X 0.000005'= 0.00012, over a 7-year period. 
If the other coefficients have also been rounded off to the last 
place indicated, then the maximum error in 

• A=>7/'o = 6(0.00005) -h 36(0.000005) = 0.0004S 
in 

A^o = 2(0.0005) -f- 6(0.00005) + 14(0.000005) = 0.00137 


in 


A'yJ = 0.005 4- 0.0005 -f 0.00005 -f 0.000005 = 0.0055555 


Table 90. — M.axbium Cu.molateu Errors ix Differexce.s .\xd Pom 

NOMI.\.E V-^LUES 


( 


0 

1 

2 

3 

A 

5 

6 


Error in aY = 0.000120 


A’l/' 


AV 


0.000480 

0.000600 

0.000720 

0.000840 


0.001370 

0.001850 

0.002450 

0.003170 

0.004100 


A'k' 


0.005555 

0.006925 

0.008775 

0.011225 

0.014395 

0.018495 


0.050000 

0.055555 

0.062480 

0.071255 

0.082480 

0.096875 

0.115370 


504 STUDY Of DYNAMIC VARIAUIUTY 

and in yj = 0 05 Thus, bj the time y't, the seventh year, for 
example, has been computed, the maximum error m that figure 
becomes 011+ This is shotm m Tabic 90 
The final error grows larger the further the work proceeds and 
thus makes it necessary to compute the coefficients of higher 
powers of / to several hgurcs beyond the number of sigmficant 
figures requued in the computed trend values A ci-oss check 
on the method of fimte differences would be to solve the poly- 
nomial equation for the temunal values of i 

The danger of cumulative error is reduced to a mimmum b> 
starting at t ** 0 and accumulating upward through the — f’s 
and accumulating downward through the +f'8 
Analysis of Cycles by Empmcal Trends. Data on platC' 
glass production m the United States, 1933-1941, have been 
selected, m order to illustrate how cycles may be studied by 
empirical trend analy sis Table 01 is a work sheet pioviding the 
figures needed to compute either a straight-line trend or any 
polynomial trend up to the third degree 

Tab^l 91 —Work Suelt for Compctino Trend and Index op Normal 
Method of Icasl tquaret 

Data Froductioo of plate glass polished, m the United Stales 
(In milhons of square feet, monthly) 

Source Survey o/ C urrent Dusmess, Supplement Vol 20 (1910), p 151, 
Vol 21 (February 1941) p 99 Annual (March 1942) p S-33 


Ve»r 



1 SeU of *uktetal« 


Second 

Tlird 

1933 

-4 

7 2 

7 2 

7 2 

7 2 

1934 

-3 

7 9 

15 1 

22 3 

29 5 

1933 

-2 

15 0 

30 1 

52 4 

81 9 

1936 

-1 

! 16 5 

46 6 

99 0 

180 9 

1937 

0 

16 0 

62 6 

161 6 

342 5 

1938 

1 

7 1 

69 7 

1 231 3 

573 8 

1939 

2 

11 8 

81 5 

^ 312 8 

886 6 

1940 

3 

13 7 

95 2 

403 0 

1,294 0 

1941 

4 

15 9 

111 1 

519 1 

1,813 7 



111 1 

519 1 

1,813 7 

5,210 7 



Si 

St 

Si 

Si 


By using Eqs (10), the values of if®, and it* arc found 
as follows 



TREND ANALYSIS 


595 


n{n — l){ 2 n — 1) 5(4) (9) 

3 ~ 3 


X' '•'V**' -j\ nr\ 

4 ^ 3 ^ = 60 

U - 2n) +3n+ l' 

]-6 o(M«). 


= 60 
= 9,780 


'75(25 - 10) + 15 + 1 
7 


708 


60(163) 


By using Eqs. (11), the values of 2/y, Xt-y, and Tlthj are found 
as follows: 


Ey = Si = 111.1 

Ely = nSi -82 = 5(111.1) - 519.1 = 36.4 
Ethj = n^Si - (2/r + l)^. + 2 S 3 

= 25(111.1) - 11(519.1) + 2(1,813.7) 

= 694.8 

Et^y = n^Si - (Zn^ + 3?r + 1)5, + 6(n + 1)-S3 - 6S4 
= 125(111.1) - 91(519.1) + 36(1,813.7) - 6(5,210.7) 
= 678.4 


From these, two trend lines may be computed, finst a straight 
line, and second a third-degree polynomial, as follow's: 
Straight-line trend: 


y'l = 


y'l 


111.1 , 36.4, 
9 ■^60 

12.3 + 0.607i 


(origin at 1937) 


Third-degree polynomial trend; 
The normal equations are 


Ey = Na + hEt + cEfi + dEl^ 
Ely = aEt d- 6St“ + cEt^ + dEt^ 
Et-y = aEt- -f- bEl^ d- cE0 -}- dSi® 
Et^y = aEt^ d- bEt* + cEt^ d- dEt^ 


in which all the sums of the odd powers of t are equal to zero, 
so that the equations for finding a, b, c, d are as follows: 



"SO^) Ot m\AUlC ViUI llflLlIV 

0) 111 1 « 9o + G0c 

(ii) 36 4 = 606 + 708d 

(ill) 694 8 »= 60a + 708c 

{i\) 678 4 »= 7086 + 9,780d 

(iiO 429 52 =. 7086 + 8,354 4d (ii) X 11 8 

(iv) - (ii') 248 88 = 1,425 6rf rf = 0 17457 

(in) 694 8 = 60a + 708c 

(i') 1,310 98 ^ 106 2a + 708c (\) X 11 8 

(i') - (ill) 616 18 46 2a c = 13 34 

Substituting m Eq (ii), 6 = —1 45325 

Substituting a in Eq (i) c = — 0 14891 

The thiid-degiec polynomial trend equation is thus 
y'j - 13 34 - 1 45325t - 0 1489U* + 0 17457<» (origin at 1937) 

By using tlie method of finite differences to solve for various 
tiend values, from Table 88 above, al t » 0, 

yj = 13 34 

and 

« -1 45325 - 0 14891 + 0 17457 
« - 1 42759 

AVj “ 2(-0 14891) + 6(0 17457) 

- 0 7496 

A»y; « 6(0 17457) 

= 1 04742, 

u Inch IS a constant difference in this case 
In Table 92, tiend values aic built up for the pi-oblem by using 
the method of finite differences First oiipositc t =» 0, the 
value of y', the fust, second, and third differences aie entered 
The constant thud difference, 1 04742, is then subtracted suc- 
cessively in the —t direction (upuaid m the table), it is then 
added successivelj iii the -hf (hicction (doivnuaid m the table) 
For example, stalling at i = 0, the second difference is 0 74960, 
the second difference at f = — 1 is 

0 74960 - 1 04742 = -0 29782 
the second difference at < = —2 is 


-0 29782 - 1 04742 = -1 34524, etc 



TUHND AXALYSIS 


r/j7 

Starting again at I = 0, the second difference is again OJ-IUGO; 
the second difference at i = +1 is 

0.74060 + 1.04742 = 1.79702; etc. 

The column of fimt differences is built up from the colunui of 
second differences. For example, starting at f = 0, the first 

20 


ol5 

to 

1,0 

o 

c 

o 

5 5 


0 

1933 1934 1935 1936 1937 1936 1939 i?.K) 1941 

ViG. 146, — Production of plate gla.->s, polished, in the United States, 1)3 5-1 041. 
Straight-line and third-degree polynomial trencU .-ihotvii -.nth ran da* a 

difference is —1.42579; the first difference for t ~ —I then 
— 1.42579 — (—0.29782) = —1.12797, the fiist difference for 
f = — 2 is —1.12797 — ( — 1.34524) = +0.21727; eic. Again, 

T.vble 92. — ^WoKK Sheet eor Finding Trend in MEnjou nr 

Finite Differences 

Kquation of trend: i/o = 13.34 — 1.45325i — 0.14391/' -f- 0 17157'' 
(origin at 1937) 


Year ! 

! 

1 I 

1 


1 

1 


•Ji 

1933 

! -4 

1 ’ 

' -3.44008 j 

6 05001 

5.59 

1934 

-3 

, -2.39266 1 

2 60993 

11.64 

1935 

-2 

-1.34524 

! 0 31727 

14 25 

1936 

-1 

1 -0.29782 ! 

: -1 12797 

14 47 

1937 

0 

1.04742 ; 

j 0.T4960 i 

1 -1 42579 

13.34 

1938 

1 


j 1.79702 ^ 

-0 67019 


1939 

2 


' 2.84444 

i.i'ioisa 

; 11.24 

1940 

3 

. 

3 96527 : 

; 12.30 

1941 - 

4 

i t 


1 16.32 

1 



598 S2Ul>Y OF DYNAMIC VARIABlLli Y 

starting at < = 0 with the first dififcrcncc —1 42579, the fust 
difference at < = +1 is -142679 + 0 74960 * -0 67619, the 
first difference at t = +2i3 -0 67619 + 179702 « 1 12083, etc 
The values of j/J are found from the first differences in exactly the 
same manner as the first differences from the second differences 
The results of the trend analjsis ate shoira graphicallj m 
Fig 146 If it can be assumed that the period of 9 years covered 
by the whole period is a Begment in a longer c>chcal movement 
the straight-line trend maj be considered to measure a part of 
that longer cjcle — part or all of its upward movement The 
shorter cycle is then shown the polynomial trend Plate 
glass production appears to have gone through one complete 
short cycle from about 1934 to about nud-1940 



CHAPTER XXII 

ORTHOGONAL-POLYNOMIAL TRENDS 

Great economy in trend analysis is secured by the use of 
orthogonal polynomials, especially if the trend desired is of 
higher degree than second-degree polynomial. It requires con- 
. siderable space to explain and describe the method of orthogonal 
polynomials, which may seem to belie the fact of its economy in 
use, but the actual arithmetic of application is simple. When 
lines of regression involving more than three coefficients are 
fitted to time series by the least-squares criterion, the rvork of 
computation by the ordinaiy method increases very rapidly. 
Laborsaving devices introduced in the preceding chapter, includ- 
ing the use of the summation work sheet and the determination of v 
Sf'*, Si®, etc., by formula, help to keep the amount of cal- 
culation at a minimum; but further reduction in the amount of 
calculation and particularly in the magnitude of the figures that 
have to be handled is obtained by using orthogonal polynomials. 

A “polynomial” is an algebraic expres.sion of the form 

a + bt + ct- 

which, for example, is a polynomial in t of the second degree. A 
polynomial in t of the fourth degree would be 

a + bt cl- + dt^ -p ei‘ 

and so forth. “Orthogonal” polynomials are polynomials that 
bear a certain relationship to each other, to be described below. 
The use of orthogonal polynomials involves merely a special 
method of computing the coefficients of a trend line; the method 
of fitting is still the method of least squares. 

One of the greatest advantages of using orthogonal-polynomial 
trends is that, if the investigator decides to fit either a higher or 
lower degree trend line than what he has already derived, the 
amount of work involved in these further calculations is reduced 
to a minimum. In fact, no e.xti'a work at all would be required 

699 



61K) .SCUD\ OF m \ iVJC V \III llilUTV 

to deteriHinc tlie uiuation for a trend line of lower degree, while 
the determmaljon of an equation for a trend line of higher degree 
would rcquiic onlj the calculation of quantities pertaining 
dircctl) to the added term and would not necessitate any iccal- 
culations of other iiuantitics flic work already done mil 
therefore not be waited 

Orthogonal Polynomials Suppose i variable t lias a !>ct of 
\alues, saj from 0 to 3 If each of tlie^ values is subotitutcd in 
a polynomial in /, the polynomial will take on a corresponding 
set of values Thug, if pi = t — 1 5 is a given polynomial lu /, 
then, as t ha* the values 0, 1, 2, and 3, pi has values —1 5, 
—0 5 +0 5 and +1 5 \nothcr poI>nomial in t, say 

p, = - 3f + 1 

ml! have a diffeient set of values, m this instance, it will have 
the values 1, —1, —1, and 1 when t has the values 0, 1, 2, and 3, 
respectively 

Orthogonal polynomials are those that bear special iclatioii' 
ships to each other The necessary condition for two pol>> 
normals to be ortliogonal to each other is that the sum of then 
product for all values of t shall be equal to zero That this 
necessary condition is met by pt = t — 1 5 and pj = — 31 + 1 

is readilj seen Thus, when f = 0, 

piP* >= (1 - 1 5)((* - 31 + 1) = -15 

when I = 1, pipi = +005, when t — 2 pipt = —0 05, and 
when 1 = 3, pipt =+15 Hence, 

2p,p, = -15+05-05 + 15 = 0 


The polynomials pi = f — 1 5 and pj = t* — 31 + 1 accord- 
ingly, possess the orthogonal property 


In general, if a set of polynomials m t, say pi, pz pi, 
form an orthogonal set, then it is necessary that 

. P' 

HpiPi = 0 

Ipjp* = 0 Ipip, = 0 I 


2pjp, = 0 

^Pip4 = 0 Xpzpr ~ 0 1 

CD 

Lpjpi = 0 

Spjp. = 0 tp,p, = 0 ) 



These aic the gentnl conditions that must be satisfied by 
01 thogonal polynomials Notice that thev aic cquiv alcnt to the 



on TIIOGONA L~POL YNOMIAL TltEiVDS GO I 

conditions that the correlation between each pair of polynomials 
is zero. 

Tf-end Line in Orthogonal Polynomials by the Method of Lea^t 
Squares. The form of a trend-line equation that has so far been 
used is y' = a + bt + ct^ dP ... . This is an arbitrary 
form', however, and it is to be noted that other forms of the 
identical equation are possible. This cam be illustrated numeri- 
cally as follows: 

The equation y' = 105.3 -f S.li — 0.7t- is identically the same 
as 2 /' = 115 + 6(< — 1.5) — Q.7{t- — 1), which may be 

proved by multipljdng out the expressions in the latter equation 
and collecting like terms. If the use of the second form has any 
advantage over the use of the first, there is no reason why it 
may not be adopted. 

Suppose, now, that instead of fitting a trend line in the form 
y' = a + bt + ct- + dP, the fitting process is carried out \vith 
respect to the form 

y' = A + Bj)i -1- Cp« + Dpi -1- Epi 

in which pi, p 2 , pz, and pi are polynomials in t of the first, second, 
third, and fourth degree, respectively, that are orthogonal to 
each other and to unity, that is to say, where pi is a polynomial 
in t of the form pi = kio + t, pz is a polynomial in t of the form 
Pz = fczo + kzii + P, Pz is a polynomial in t of the form 

Pz = kzo ■}■ kzit -j- kzzP "i" tz> etc. 

and where 2pi = 0, Spz = 0, 2pz = 0, 2pi = 0, and Spipz = 0, 
2pipz = 0, 2pipi = 0, 2pzp3 = 0, etc. With reference to the 
arithmetical illustration given above, which was a third-degree 
polynomial, this is equivalent to deriving a trend line of the form 

p' = 115 -f Q{t - 1.5) - 0.7 {P - 3« + 1) 

instead of the usual form y' = 105.3 8.1< — 0.7P. 

Either method 'will, of couree, give the same result; for", 
whichever form is derived, it can be converted into the other by 
simple algebra. It is the purpose of this section to show the 
simplification gained by using the orthogonal-polynomial form 
rather than the usual form. The problem of finding the forms 
of the polynomials themselves, i.e., the values of the k coefficients, 
will be left for a subsequent section. 



G02 


SiUDY OF D\^ VUtlAIilLliY 


If 1 trend line ih put m the orthogonal polynomial form 
y « 1 + Bpi + Cpt + Dpt 

and i-? then fitted by the method of Ica^t squares, t e , if A, B, C, 
and D are determined so that 


2 ( 1 / - yV = 2(y - - Bpi - Cpt - DpiY 

IS made a minimum, the following conditions ate olitaincd 


l(y — A — Bpx — Cpi — Dp*) = 0 
2pi(y — A — Dpi — Cpi — Dpi) =* 0 

~ A — Bpi — Cpt ~ Dpt) = 0 
2pi(p ~ A — Bpi — Cpt — Dpa) = 0 
or 

— NA + BSpi + CSpj + DSp* 

2piy = Alpi + DZp\ + CZptpt + DSpip, 

Spjy = Alpi + BZpipt + CZp\ + Dlpipi 
Spiy = AZpi + BZp\pt + CZpipi + DZp\ 


But since 1, pi pt and p* form an orthogonal sot (by assump- 
tion), it follows that Zpi s 0, Zpi — 0, Sps — 0 Spips « 0, 
Zpipt » 0, and Ipipt « 0 Hence the above equations reduce 
to 


ind 

and therefore 


Zy * NA 
Zpiy = BZpi 
Zpty « CZp\ 

Zp»y = DZp\ 


N 

. 2piy 
Zp\ 
ZjM 

ip| 

ipi 


( 2 ) 


The simple form of these boluiions will be noted It will also 
be noted that the solution for A is independent of pi, pj, and 
Pi and that the solution for B depends only upon pi, the solution 



OIlTHOaONAl^POLY.WOMlAL TliPXD.'i 


G03 


for C only upon pj, and the solution for D only upon pj. This 
moans that the value of A would liave been the same whether a 
first-, second-, or tliird-degree trend line had been fitted. Simi- 
larly, the value of B would have been the .^ame whether a first-, 
second-, or third-degree trend had been fitted, and the value of C 
would have been the same whether a .second- or third-degree 
trend Hue had been fitted. For if >/ = .1 -f Bpi had been fitted, 
the .solutions would still have been 


A 



and 


B 


Sp! 


If y' — Bpi -j- Cpi luul been fitted, the ^olutions would 
still have been 






and 



'I’he addition of the term C'p- docs not therefore change the 
values obtained for .1 or B, and tlie addition of the term Z)pj 
iloes not change the values obtained for .1, B, or C. It also 
can be seen that if a fifth term were added to the trend line, 
namely, Ept, making it a fourth-degree trend, the value of E 
would be given by E = -pip/-p^ and the values of A, B, C, 
and D would be the .same as before. It is this simplicity and 
independence of the solutions of the least-sciuarcs equations 
when orthogonal polynomials are used that give the orthogonal- 
polynomial method its main advantage over the ordinary 
method. 

Forms of Orthoyoncil Polynomials Used. The forms of the 
orthogonal polynomials to be used for fitting trends can be 
generalized; what is rcciuircd is to find the k's in terras of the 
given values of i and the number of years involved. The con- 
dition has been laid down that pi = Icio + t, ps = k^o + k-d -f P, 
and Pi = kio -{- kul + kal- -f- P,' etc., are to be polynomials of 
the first, second, and third degree in I, respectively, that are 
orthogonal to each other and to unity. The problem is to make 
use of this condition to determine values for the k’a in terms of 
the given values of 1. AVhen this is done, it will be possible to 
find the actual values of .1, B, C, and D, from the formulas of the 
preceding .section. 



OOJ 6TUI>} OF m \ IJ//C J 

U\ way of )lIui.tration, the forms of on]> pi and pj Mill be 
dcn\ cd, the method can be ixadtl> cUendeti to the determination 
of the form-j of p, and of higher poI^Tiomialsi 

Finst, it la a^-^umed that the time interMls T are mtoaured 
from the mean T, ho that pi ind pj become pi *=« A lo + 1 and 
p: = /*« d* Aid + t*, Mhere t — T T In idditioii, it w sup* 
|Ky<d that the time mtorvaU to mIucK the \anat)lc rtfci-s arc 
e<juali> ‘•paced and \nthout interruptions According to tlicsc 
assumptions, t Mill have a mean of zero, its higitest \aluc Mill 
l>e + ^ ~2 — " its loMcst \aluc — ^ j example, if 

there arc 5 jears of dat r, the middle \car miU lie 0, the first 
lear — 2 and the last j car -|-2 Iftherc arc4 \cars of dit i, the 
3 11 

limt star \mU be — g, tin, second >c\r — the third jtar + 

and the last + ^ 

\.ccordingl> , all the odd moments of t, sucli as If/iV, ifVA, 
and 1/V*V will be zero, the c\en moments, such os i/*AV, 
ind itViV, arc computable from simple formulas depend* 
ing cntirclj on N, the number of >cars, is alrcod> noted in 
Chap XXI » 

\\ith these assumptions, the dcmation of the form of tlio 
orthogonal pol>nomiaIs, that is to saj, the derivation of the 
values of /i#, Aid and kt\, nuj non lx; undertaken The coii- 
<htion that pi, pi, and I shall be orthogonal to each other requires 
that Ipi = 0, ipi = 0, ind ipi/i* = 0 TliaM, equations nuj 
bo Mntten as fulloMs 

ipi = + 0 “ A'Ai* + it = 0 (0 

ipi = i(Ai9 + Aid + /*) = iYA„ + A, lit + il* = 0 (u) 

ipipj = ip*(Aid + I) = tioip* + ipzt = 0 (m) 

trom the‘'e ccjuatiuns, the values of the A’s can reidilj be 
obtained fcmcc Lt = 0, (») gives iVAis 0, or Aid — 0, and 
, ' il* 

L<i (ii) gi\ cs iV / 10 + if’ = 0, or Am *= — 1 rom E<| (ii), 

it Is kncjMii that ip* = 0, hence, Ixi (m) becomes ip-f = 0 
Substituting the equivalent of p*, this gives the condition, 

ipd = i(/*» + Aid + t*)l = A»it + A.,i/* + if* =» 0 (i\) 

* 1 licssj aiMsmiiptiorw Mcro made in the pri‘cr<JiDg ct apt* r Cf 1 iHis 
ht to &G, Chap \\I 

’ Cf pp o84 586 



OllTJJOGoXAL-POLYXOMIAh riCGM).-; 


Since both and are equal to zero, this becomes 

= 0 

and hence kzi must be zero, oincc 2/- is not. The valuc.s of the 
A-’.s, therefore, are u.s follo\v.s: 

kio — 0 \ 

, (d) 

j Zi' ( ^ ^ 

aiul the forms of tlie polynomials pi, p; are liierefore 

Pi == t ] 


in which 

Ilonee, 

Accordingly, 


, ^ «(« - l)(2/t - 1) 
■ ' 3 

iV + 1* 


17 - _ iXL-J 
X 12 


Pi = <- - 


A'- - 1 


Similar method.s of analysis may be used to dei'ive the forms 
of pz and higher polynomials. The results obtained for poly- 
nomials up to the fifth degree may be listed as follows:' 


1h = t 


Pi = f- — - 
Pz = -~ 

Pi = ~ - 


N- - 1 


12 


3.Y- - 

7 / 

20 


3A'2 - 

13 

\i 


b(N- - 

Jl 


ocd 

- 230.Y 
1.008 


* f'/, Eqs. (10), Chap. XXI. 

‘ Cf. r^siica, R. A., Staiislical Mithods for liesuirch M’orkcrJi, Section 27. 



COG 


STUDY Ot DYSAytW VMIIAUILITY 


Thus, Jt IS to be noted that a trend line can be fitted in two dif- 
ferent forms, by the method of least squares; it can be fitted 
m the form y' = a + ht + ct* + dt* 4* (where t = T — ?) 
by the methods described in the preceding chapter, or it can bo 
fitted in the form i;' = + Bpi + Cp* + Dp* 4* by the 

method of orthogonal polynomials If the orthogonal poly- 
nomial form IS used m the fitting process, the ordinary form of the 
trend equation can readily be domed from the results, it should 
be repeated that the criterion of fit in each case is the least- 
squares entenon 

Calculalton of the Coe^ictents ^1, B, C, If the values of 

Vh Vh given in Eq (5) aie substituted in formulas 

for A, B, C, etc [Eqs (2)], the following values are 
obtained 


■ iV 






ISO 


■ - \){N- 




2,800 


SiN' - 1)(JV* - 4)(iV» - 9) 


_^^00 


iV(Ar* - - 4)(iV* - y)(iV* - i6) 

„ 098,644 

r = 


- Ar(;v* - l)(N- - 4)(iV» - 9)(iV» - 10)(A^> ~ 25) 


(X 




5(N^ 


15jy« - 230A^» 4- 407 


1,008 


i‘^y 


(0) 


In order to illustrate the algebraic procedure by which the 
above formulas are obtained, the formula for C will be demod, 
as follows 

_ 2piw 



ORTHOGONAL-POLYNOMIAL TRENDS 


607 


But 



The formula for however, is lienee 


i:‘= 


N(N- ~ 1) 
12 


21* iV' — 1 (3N- — 7) 

Likewise, the formula for is — — ' — 20 — hence 




N(N^ - 1) (3N- - 7) 


12 20 
Therefore the denominator of C becomes 
N(N^ - 1) (3N^ - 7) 2(N^- - 1) N(N- - 1) . _ 1)2 


12 


20 


12 


12 


+ 


144 


N 


Taking N{N- — 1)/12 out of each term, 


N{N‘ - 1) r 3iV2 - 7 2(iV=* - 1) {N- - 1) 

12 [20 12 12 

which readily reduces to 

N(N- - 1) (N- - 4) _ N(N- - 1)(N- - 4) 
12 ' 15 180 


Thus C has the formula given above. The formulas for the 
other coefficients can be obtained in the same way. 



aiUUY OF DVVAW/C ViHlABILin 


W)8 

Equations (6) couJd be applieJ by using a work sheet wth 
columns for the jiroduct terms mdicated m order to obtain 
Uly, 'Sl^y, etc Greater economy is obtained, however, by 
using the subtotal summation type of work sheet illustrated m 
the preceding chapter By usu^ such a work sheet, an expe- 
ditious method that in\ olves only addition and is self-checking 
has been evohed for finding A, B, C, A brief description 

of this method, together with the mathematical analysis that 
justifies its use, iviU now be given 

a 13 defined as so that 
iV 

(7) 


and ex' is defined as equal to a Accordingly, 


I 


<»**«' 


J-y Si 
N N 


(0 


From Eqs (14), Chap XXI, 




and, the defimtion of a. 


If d IS now defined as 


25* 


' NiN + 1 ) 


“ lf(X + 1) S 


(u) 


lf(X 

But 2(y = since pi = t, and if 5' is defined 

= 2Sp,j/AV(iV + 1), 


^ ~ a — (i 


(ill) 



OHTHOGONAL-FOLYMOMIAL TliEXDti 


009 


Since B = it follows that 




Again, from Eqs. (14), Chap. XXI, it is found that 

in which a and ^ maj'- be substituted for equivalents, so that 


NiN + 1) 


_ N{N + 1)^- , N{N + 1)=* + N{N^\) „ , V 

_ — « ^ 




im - 1) 


Hence 


^ t^y = P22/ + ^ y - 5 ) + 


{N- - l)N 


Therefore, making substitutions in the above value of 2Sz, 

_ N{N + 1)-^ , {m - l)N , N{N + 1){N + 2) ^ 

■^*3 - 5 « i « -t- 2 • 





010 


Alt/DI OF DYlfAUIC VARIABILITY 


25j 


-ZN{N + lY + NiN + n(Ar - 1) 

12 

, WV + W^LL?) f 


' ? + Pa 

-N(N + 1)(N + 2) _ ^ N(N+ nOV + 2) ^ ^ V 
Now, if y 13 defined as 

“ mN + 

then 

2NiN + 1)(N + 2) -N(N + l)(N + 3) 

0 ” 6 “ 


And if y' ii defined as y‘ 
then 

2y « 
and 


ATfiV + 1)(A^ + 2) iW 

+ 30 + y' 

Y=a-3d + 2y C\l) 

and since C =■ - 4) 2 


C = z 


30 


■ (AT - l)(Ar - 2) 

In the same manner, it can be slioim that if 
. 24 


■ N{N + \){N + 2)(N + 2) 


^4 


20 ■’ 
‘ N{N + l)(N + 2)(A^ + 3) - 


5' = a 
D 


3 + IOy - 55 
140 


(N - l)(Ar - 2)iN - 3) 


(u.) 


(vm) 


(ix) 



ORTUOGON^iL-POLYNOMniL TRENDS 


611 


As a result of the above analysis, from Eqs. (i), (ii), (v), and 
(^dii), the foUo^ving formulas are obtained: 


_ -Si 
“ “ A' 

, 2 
^ ~ N{N + 1) 

_6 (j 

“ N{N + 1){N + 2) 

- 24 

~ iV(fV + 1)0V + 2)(iV + 3) 

m „ 

* “ NQi + l)(iV + 2)(iV + 3 )(jV + 4) 

720 

^ ~ N{N + l)(i^ + 2)(iV + 3)(iY + 4)(A^ + 5) 



( 8 ) 


The values of € and of X are indicated by extension, since the 
symmetrical pattern of these formulas is readily apparent. 
The numerators run 2!, 3!, 4!, 51, 6!, 7!, etc., and the denomi- 
nators run N, N{N + 1), N{N + 1){N + 2), 

N{N -h l)(Ar + 2) (AT + 3), etc. 

Similarly, from Eqs. (i), (iii), (vi), and (ix), the following 
formulas are obtained:^ 

a' = « 

— a — 

Y = <x — -h 2y 

5' = <x — 60 + lOy - 
e' = oc ~ 10)3 -h SOy 
X' = or — 150 + 70y 

and from Eqs. (i), (iv), (vii), and (x), the following formulas are 
obtained; 


55 

— 355 146 

- 1405 + 126e 


( 9 ) 


42X 


'For additional equations, see Fisher, op. ci(., or George W. Snedecor, 
Statistical Methods (1940), pp. 324:-334, where the procedure is appUed to 
problems of curvilinear correlation in which probability interpretation is 
valid. 



012 


STUDY Ot DYNAMIC VAUI IDILITY 


n = 

C = 
D = 
B = 
F = 


- 1)(A’ - 2) ' 

140 

(A - 1)(.V - 2)(.V - 3) ^ 

030 

(.V - l)(Ar - 2)(Ar - 3)(.v - 4) " 

2J72 

{N - 1){N - 2){N - 3)(/V - 4)(Ar - 5) ^ 


(10) 


iahlcs to lie Lsed tn Orthogonal polynomial Analysis to Sate 
Calculations ^Vll the explanation ncccs>s>ar> for the application 
of the metliod of ortliogonal polj nomnls to a problem has been 
given Ihus, from a uork sheet providing the senes of sums 
jSt( Si, , Eqs (8) could be used to find the senes a, d 
7, 5, , from these, Eqs (9) could be used to find tlio senes 

o', d , 7', i', , from these, Eqs (10) CQiild bo used to find 

the senes A, B, C, The set of orthogonal poljiiomials 

fitting the data according to the lcast>squarcs entenon could 
then be v\ rittcn y' = 4 + Bpi + Bpj + Cpi + From 

Eqs (5), values of pi, pt, pt 111 terms of t could then bo sub- 
stituted, and the final equation of trend m terms of i would bt 
found But it is desirable to effect another cconomj, bj use of 
three tables of values that are the same for all problems having 
the same number of >car6 of data 

Thus, tile use of Eqs (8) will be gieatly facilitated by the use 
.r.., . , . . N{N+l)(N + 2) 

of Table 93, a set of constants, ^ — > q~~ ’ 


etc , worked out for vanous odd values of N, tliat is to saj, for 
vanous numbem of years, from 11 to 41 Ihc use of Eqs (10) 
will be greatlj facilitated by refernng to 1 able 94 for the v anous 
. r , . . iV- 1 (AT - l)(.V-2) 

values ot the senes of constants — : — * » 

b 30 


(N - 1)(A - 2)(iV - 3) 
140 ' 


etc 


And the use of Eqs (5) wiH 


be made easier by refei nng to Table 95 for the v alucs of 


N* - 1 
12 ’ 


3A* - 7 3i\ * - 13 
^ 20 —, 14—"*'= 



Odd nitmhcrs of years, from 11 to 41 


ORTHOGONAL-POLYNOMIAL TRENDS 


613 


O 

P 

'z. 


55 

Q 

S 

s 

h-1 


:c 


CO 

o 






Tabll — Val(ji> ot Sri-ntiED \ auladlls Ui tisDEvr troN thl Nlublu or \ > Aiti IscLCuro is liu 

Odd number* cf ytarA,Jnim 11 (<■ 41 



42 OOOOOO 300 000000 2.211 OOOOOO 10 320 OOOOOO 
46 8000G7 361 512407 I 2.812 000000 2! .720 OOOOOO 
52 OOOOOO j 423 425>571 1 3,481 {»23S9> 28,4S5 101805 



Odd numbcra of years, from 11 41 


ORTHOGONAL-POLYNOMI^iL TR EiVD:s 


>5 

O 


n 

y, 

r 

ri 


CO 

Urn 

o 

x 

5 

> 


»o 

C3 


t«<NCOCOCO OOt«.OOC5 
OOOCOCO t>»OcOC50 

ocoococo c?oooco 

CO C'J Cl CO CC ^ O O Cl CO 
co*t:4i-hcoco -t<oco*-»c'i 
COt-H-T^COCO COOCO-T^LO 


O CO CO CO O CO 
O iO -SJ’ CO lO -T* 
O lO O CO lO i-H 
O lO O CO iO 
O lO »-« CO lO lO 



“• 1 





• 








. , 


c*l 

” 1 

o 

CO 

Cl 


h- 

CO 

t:4 

o 

ci 



CO 

T-t b» 

a to 

1 1 

1 

o 


o 

c- 

lO 

o 



■.rH 


Cl 




1 





00 

t>F 

o 

o 

c* 

CO 

40 

CO 

CD 40 

o to 

fe:' 

i 






Cl 

•o* 

40 


o 

CO 

b- 

CM 

-3’ ^ 

^ • 














Cl Cl 

CO 



o 

o 

»o 

CO 

CO 

lO 

o 



h-. 

n 


CO CO 

to o 



o 

o 

uo 

CO 

CO 

CO 

o 

to 

ib 

CO 

<o 

40 

CO CO 

40 O 



o 

o 

lO 

CO 

CO 

lO 

o 

CO 

40 

CO 

o 

40 

CO CO 

40 O 



o 

o 

lO 

CO 

CO 

lO 

o 

CO 

lO 

CO 

CO 

40 

CO CO 

40 O 


■X) 

o 

o 

lO 

CO 

CO 

»o 

o 

CO 

40 

CO 


40 

CO CO 

40 O 

S: 


o 

o 

to 

CO 

CO 

»o 

o 

CO 

4-0 

CO 

d 

4.0 

CO CO 

40 O 



00 

i-O 

d 

00 

d 

d 

d 

fH 

D 


m 

d 

CO CO 

O 4-0 



CO 

n' 

to 


C5 

Cl 



o 

CO 

CO 

o 

CO 

d to 











Cl 

Cl 

Cl 

CO 

CO CO 


i:- 


o 

o 

o 

o 

Cl 

*3* 

Cl 

o 

CO 

(O 

fO 

Cl 

Cl 

o o 



o 

o 

o 

o 



•O’ 

o 

o 

(O 

CO 

-f 

1-^ 

o o 

1 


o 

o 

o 

o 


|>F 

rH 

CO 

CO 

<o 

CO 


h- ^ 

o o 



o 

o 

o 

o 


CO 


ro 

o 

o 

CO 


4-0 o’ 

o o 



o 

o 

o 

CO 

lO 

CO 

4.0 

o 

(O 

CO 

CO 

40 

CO 40 

o o 



o 

c 

d 

o 

CO 

Cl 

CO 

Cl 

o 

o 

o 

CO 

-r CO 

o o 


















w; 

CO 

■0* 

o 

Cl 

CO 

CO 

c 

Ci 

CO 


to 

O’ 

CO tc 

ci CO 

( 


C5 


lO 

CO 

h- 



lO 

r? 


a 

o 

1 ^ to 

!-• f 




•H 

Cl 


o 

o 


o 

CO 

t'- 

CO 

Cl 

a Cl 

CO O 

S; 






















IF.4 


Cl 

Cl 



to 

Ci 

Cl 40 

















« 


















o 



o 


00 


o 


-f 

CO 


CO ^ 

O -3’ 

rt 


o 



o 


Cl 

h- 

o 

P-H 


CO 

b- 

Cl 

o 


o 

h«. 


o 

»o 

O* 

40 

o 


h- 

o 

4.0 

-r *o 

o b* 



o 

lO 

1.0 

o 

CO 


CO 

o 

4.0 

lO 

o 

CO 

r-4 CO 

o *o 

1 


c 

(Xi 

00 

CO 

Cl 


Cl 

CO 

CO 

CO 

CO 

Cl 

b- Cl 

O CO 



c 

Cl 

Cl 

o 


lO 


o 

Cl 

Cl 

o 

•O’ 

4.0 -r 

O Cl 

5; 


1.0 

iC 

c* 


o 

CO 

Cl 

CO 

4.0 

cri 

40 

Cl 

-H d 

40 a 



Cl 

CO 


o 

w 

o 


CO 

4.0 


fO 

cc 

to c: 

Cl ‘O 








rH 



Cl 

Cl 

Cl Cl 

CO C9 

r- 


1 

00 




CO 

00 

o 


o 

CO 

00 

o 

-T* O 

X CO 

1 



».o 

CO 

CO 

CO 

40 

o 

CO 

o 

40 

CO 

CO 

CO 4.0 


«♦ 


•-< 

CJ 

CO 

■*7* 

1.0 

CS 


c: 

c 

Cl 

o’ 

to 

CO o 



! 








•-H 

»— • 

l-H 


-- Cl 






o 

o 

r- 

o 

o 

w 

o 

o 

b^ 

o 

o 

b- 

o 




o 

o 

to 

o 

o 

to 

o 

o 

to 

CJ 

o 






o 

o 

to 

o 

o 

to 

o 

o 

to 

o 






to 

o 

o 

to 

o 

o 

o 

o 

o 

to 

o 

o 







o 

to 

o 

o 

to 

o 

o 

o 

o 




o 

o 

o 

o 

o 

to 

o 

o 

to 

o 

o 

to 

o 






CO 


n 

o 

•O’ 

Cl 

o 

o 

o 

o 

ci 

d 

d 

o 

rH 

»-H 


d 

CO 

CO 


40 

to 

b* 

t/J 

CJJ 

o 

j—i 

■rH 

rH 

rH 

rH 


C0*0l'-0 

^ ^ r-4 Cl Cl C? Cl Cl 


r-4 CO lO C5 
CO CO CO CO CO -f 


615 



610 sr{/l»s OF DWi^MlC \AltfiBlIIT^ 

Idianlage^ of Method of Orthogonal Polynomials The nitthod 
of orthogonal polynomials is % great ilme^>a^er ■whenever a trend 
of higher order than a second degree iiolj nomial is fitted While 
it his required '>0 eral pages to descrilic the method, it mil bo 
noted that the actual solution of a problem requires little more 
than a page of figures besides the work sheet This is illus- 
trated m Chap XXIV 

But the saving of time is not the sole adv antage of the method 
of orthogonal poljnomials In addition, the set of orthogonal 
poljuomials that is obtained when values for A B, C, D, , 
are obtained, that is to say 

y' = A -f Bpi + Cpi + Dpi + 

constitutes the solution for an> one of seveial trend linos 
Thus y' 4- Bp\ is the straight-Une trend, the addition of 
Cpi gives the second dcgi'ee pol> nomial trend, the addition of 
Dpi gives the tlurd-dcgree polynomial trend, etc It is not 
necessary to lecakulatc values for A, B, C, , forthc vanous 
trends required If a problem has been vrorked out to include 
solutions for A, D, C, and D and subsequently it is decided that 
E is required, it can be found by adding one more column to 
the work sheet and finding the value of E without recalculating 
the values of 4 B,C, andD 

This convenience of obtaining several types of trends from 
one orthogonal set comes from the fact that the terms of the 
orthogonal equation aie lincirl} uncorielated with each other ’ 


' 'iee p 600 



ui-iAn Jilt AAlll 


TIME-SERIES ANALYSIS— SEASONAL VARIATION 

Historical Background. The second major stimulus to tlie 
development of methods for analyzing time series, listed at the 
beginning of Chap. XX, \va.s the troublesome effects of seasonal 
variations in economic activity. Writers on labor problems 
stress the evil effects for labor of wide seasonal fluctuations in 
some employments. The effects of seasonal vacations upon the 
banking and credit system were emphasized during the nineteenth 
century and the early part of the twentieth century. Even as 
early as 1793, Alexander Hamilton advised that redemption of 
the public debt be carried on during the winter, for, said he, 
“it is a familiar fact that during the winter in this country, there 
is always a scarcit}'- of money in the towns — a circumstance cal- 
culated to damp the price of stock.” ^ 

Jevons made an analysis of the effects of the “autumnal pres- 
sure” on the London money market and calculated the average 
monthly fluctuations in currency movement between the Bank 
of England and its branches (1855-1862) and the average 
monthly excess of payments or receipts of British coin at the 
Bank of England for the same period.- In 1890, George Clare 
analyzed the seasonal variations for the period from 1881 to 
1890 in the circulation of the Bank of England, in public deposits, 
in “other deposits,” in “other securities,” in the “reserve,” and 
in the “internal gold movements.”® In 1902, J. P. Norton pub- 
lished a studj’’ of the New York money market in which he com- 

} 28th Congress, 1st Session, Executive Document, 15, p. 199. C/. Myehs, 

Margaret G., The New York Honey Market, Vol. 1, Origins anti Develop- 
ment, p. 208. Other early references to seasonal fluctuations are II tint’s 
Merchants’ Magazine, Vol. 20, p. 302, Vol. 39, p. 582; Journal of Commerce, 
Aug. 3, 18-16. 

^.Investigations in Currency and Finance (Foxwell ed., 1909), pp. 158-159. 
Cf. Mitchell, W. C., Business Cycles — The Problem Slated and Its Setting, 
(1928), pp. 199, 236. 

^ A Money-Market Primer (2d ed.), pp. 19, 21, 31, 42, 53, 55. Cj. Smith, 
James G., Bexja.mi.v H. Beckhart, and Willia-M A. Bnoiv.x, The New York 
Money Market, Vol. -1, External and Internal Relation.s, p. 421. 

617 



G18 


biuu\ ot mviw/c I U{i\iiiiiT\ 


putcd tho bcasonal vanation m loauH, the pciks occurring on 
Mar 1 and in July and December and the Ion points occurring 
at the beginning of the year, in Maj, and at tiio end of No\ciu 
ber ' Tho outstanding statibtical anahsis of tc i*onal \ anationa 
in the New York money market before the First t\orld ^\ar is 
tliat prepared for the National Monctarj CoiumiS'ion lu 1910 
by Prof F W Kemmerer* In this stud^ he anal} zc<l 5>cat.oiial 
variations in money rates, exchange latcs, bond yields, curixncy 
movements, and deposits Ihs anal} sis brought out the sci- 
soiial relationships m a striking manner, m spite of eery strict 
limitations in available data at tho time Much of his uork is 
based upon data gathered by the qucstionn lire method 
Causc9 of Seasonal Venation Tuo t}pes of underl}ing foiees 
cause seasonal variations m economic activity (1) climatic con* 
ditions giving nso to seasons m agricultural production, m out 
of door construction work, m the manufacture of clothing, m 
the use of fuel, and in tiavciing, etc , and (2) foiees ansmg from 
convention, such as the Chnstma<? and Faster trade and sea* 
sonal stylo convention * The effects of these various basic 
seasonal milucnccs upon the Nou York money market and upon 
the banking and credit structuic of the United States have 
recently been exhaustively studied and published m Vol 4 of tho 
previously mentioned studies of fhe New YorK Money Market, 
edited by Prof Benjamin H Bcckhart of Columbur University « 
In large part the movement for banking reform m this county , 
which culminated m the studies of the National Monetary Com- 
mission and the Federal Keserve \ct of 1913, was the result of 
the evil effects of seasonal fluctuations in tho demands of trade 
giving rise to penodical stringencies in the money market and fre- 
quently initiating monetary panics Consequently, it uas one of 
the most important aims of the Federal Reserve by stem to devise 
an elastic curicncy and cicdil system that would accommod ito 
these seasonal demands ‘ Tims banking lefoim m the United 
‘ Slaltslical Stvdiet in the New York Vonej Market pp 62-04 
*bea$onal Variations in the Relative Demand for Monej and Capital in 
tl e United States (N atioaal Monetary Commission Publications), Vol 22 
*Cf Mitciieli^ op ci<,pp 236-240 

* The New York Money Market Vol 4, External and Internal Ilclations, 
pp 417-542 

• Tl e New York Money Market Vol 2, Sources and Movements of Funds, 
1 p loj-374 



TIME-SERIES AN~‘iLYSIS — SEASONAL VARIATION G19 

States is a case in which along-recognized evil was finally statisti- 
cally measured and evaluated and a reform in the system definitely 
resulted in improvement. 

Not only in the field of banking has the study of seasonal 
variation by statistical methods been stimulated. In addition, 
unemplo 3 nment with all its economic, social, and psychological 
implications has aroused great concern about the measurement of 
such variation. E.vtended reference to the problem of seasonal 
unemployment was made at former President Hoover’s Con- 
ference on Unemploj'ment, in the Report and Recommenda- 
tions of the Committee to Investigate Business Cycles and 
Unemployment.* 

In the hearings before the Committee on Education and 
Labor, of the United States Senate, in 1928-1929, much material 
and discussion are devoted to the subject of the seasonal varia- 
tions in employment in industries and trade.- Franklin D. 
Roosevelt, when governor of New York State, appointed a Com- 
mittee on the Stabilization of Industry for the Prevention of 
Unemployment, which made its report to him in November, 
1930, entitled Less Unemployment through Stabilization of 
Operations, in which the subject of seasonal variations in 
employment constituted an important part. 

During the yeans leading up to the depression of the 1930’s, 
much was written on seasonal variation in emploj'inent and its 
contemplated stabilization. Thereafter, the problem of c 3 'clical 
unemployment and its solution b 3 '' means of unemployment 
insurance and the entire social .securit 3 " program dominated the 
scene. ^ 

* New York, 1923, pp. 0, 116-120, 101, 21.5. 

*70th Congress, 2<l Ses.sion, “UncmpIo3anent in the United States,” 
S.Il.219. 

’ SsnTii,.EDWi?.' S., Reducing Seasonal Unemployment, The Experience of 
American Manufacturing Concerns (1931). Dougl.vs, P.vul H. and Aahox, 
DusECTon, The Problem of Unemployment. This book devotes pp. 73-118 
to the subject of seasonal variations and regularization of industry to 
stabilize such fluctuations. II.^xsen', Alvi.v H., and TiLLit.vN' M. Sogge, 
Seasonal Irregularity of Employment in Minneapolis, St. Paul and Duluth 
(Emplo3-ment Stabilization Re.search Institute, November, 1931). 

KiDOE, tv. A., “Employment and Income of Labor in the United States, 
in InleriMtional Unemployment (a stud3' of fluctuations in emplo3fment and 
unemployment in several countries, 1910-1930, Industrial Relations 
Institute, The Hague, Netherlands, 1932). 



G20 


SraD\ 01 \ UilUHLllY 


A familiar example of beobonal activity in the economic faphtio 
IS construction activity, which gjvci not more than two-thirds 
as much employment in the winter months, on the average, as 
in the summer Some important manufacturing industries, too, 
such as the automobile, agricultural implements, and rcadj made 
clothing industries, show a considerable <scasonal fluctuation 
To be sure, the busy season in some industnes comes in the dull 
season for others, a fact that tends to level out the diifercnces 
between the number emplojed in industry m its entirety in 
one month as compared with another But this does not mean 
that the workers released by one industry are absorbed by 
another to a sufficient degree or with sufficient promptitude to 
obliterate the variations from month to month m the amount 
of their employment Bamcra of specialized skill, geography, 
and attachment to particular occupations and localities prevent 
anything liko the dovetaihng suggested by the figures of the 
total number employed ‘ Consequently, the statistics of total 
employment may show little seasonal variation, while at the 
same time lai^c degrees of seasonal unemployment exist in many 
parts of the total The fact that there is no seasonal variation 
or little seasonal v anation in total employment does not solve the 
unemployment problem for the seasonally unemployed worker 

One reason why concern, statistically speaking, about the 
subject of seasonal variations in employment has been stimulated 
IS because the opinion prevails that this particular type of 
unemployment is m large part avoidable The movement to 
inaugurate unemployment insurance m the Umted States was 
partly based upon the belief that such a measure for the relief 
of unemployment would tend to regularize industnes affected 
by seasonal unemployment It is recognized that the greater 
problem of cyclical unemployment is less easily solved The 
literature on the subject of unemployment insurance m the 
United States makes it clear that the movement is directed par- 
t/cularJj toJiard tho rega\ans&lioa oi ind^usiry io ehnaaate as 
much as possible of the seasonal fluctuation m employment * 

With these problems in mind, students of the labor problem 
asked What types of business are responsible for the largest 

* McCabe, David A, chapter on Unemployment, in Facing the Fact* (a 
S)inposium, 1932], pp 324-325, 338-331 

* Cf \IcCabf, op at , pp 344-346 350 



time-series analysis— seasonal variation 621 

part of this seasonal irregularity in employment? What are the 
peak and slack seasons of employment in different businesses, and 
what are the amplitudes of fluctuations? What can be doL to 
make business less seasonal and less irregular? What is the cost 
of regularization plans in an industry, and how do such costs com- 
pare with the savings resulting from more regular use of capital 
investment? These are the types of exceedingly practical prob- 
lems presenting themselves in this field of economics, and they 
have stimulated statistical research to take measurements of 
seasonal variations. They are of practical significance to 
employers and to investors and to workers. They are of great 
social and psychological significance to the social scientist, the 
economist, and the political theorist. 

Methobs of Measuring Seasonal Variations 

It has been seen that the method of discovering trends either 
for their own sake (rational trends) or in order to remove them 
from the data, i.e., to get rid of them (empirical trends), has been 
based upon curve-fitting technique. The technical problem 
involved is a simple one even though the mathematics may be 
complex in some cases. The simplicity of the idea is somewhat 
offset, however, b}'’ the irrational character of the procedure. 
This is a troublesome factor because it is the function of the 
statistician not only to apply mathematical anal3rsis to statistics 
but also to explain what he does and why he does it. Enough 
has been included in Chaps. XX and XXI, to indicate the 
general character of this problem. 

In the case of seasonal variation, the difficulties of the statis- 
tician are just the reverse; for while it has been possible to build 
up a perfectly rational procedure, upon the basis of the theory of 
averages, the technical problem involved has been found to be 
a complex one. The rational concept underlying the procedm e of 
measuring seasonal variation is that, where a time scries has a 
characteristic seasonal variation occurring year after yeai, it 
should be quite reasonable to depict a “typical,” or average, 
seasonal variation for that time series. 

In its abstract aspect, therefore, the concept is perfectly 
rational. Homogeneous variates are to be averaged to obtain 
a type. For example, it is proposed to average the amount 
b}’^ which January data are higher or lower than those of other 



G22 iiTVDi Ot mMU/C V IHt IHILITX 

tnoQths of tho \car, to axccage tlic amount bj wluch IrebruAr> 
data arc higltcr or lower than thoi<c of other months of tho 
j ear, and to on, untvl a picture of Ums “ type ” of j^cnodical mo\ e> 
ment that occurs t\cr3 j car is obtained, although each ^ car raa> 
be sligbli> diHertnl from the tjpe Moderate sanations from 
the tyjx. arc quite consonant with the thcorj of a%tngc3 and 
tlicir application to the problem of measuring seasonal \anatiun ‘ 
\\ hen the ralionil procedure H to be put into cfTcct, howc\cr, 
difficulties of a technical character arise A time senes of raw 
data that b^ a pnon knowledge should ha\e a distinctucly 
regular <ca-.onaI vanation ma> be ‘•elected A graph of the time 
•^-rics 13 made, and a tea-sonal \anation occurring etcry jtar is 
revealed, but the seasonal pcnodicit3 m the raw data is distorted 
bj other movements, namelj, trend and cjcle Ihis was noted 
at the beginning of Cliap XX where a h^jxithctical time senes 
was constructed It is clear that the data m their taw state 
cannot be averaged to find the typical seasonal pcnodirity 
lhat IS to sav, January , 1937, w not homogeneous with resped 
to seasonal vanation with January, 10-10, because tho relative 
|x)‘ition of the respective Januancs (1) as to trend and (2) as to 
cycles IS not comparable In other words, averaging the raw 
data of all the Januancs m a senes of data, all tho Fcbruancs, 
etc , for the 12 montlis of the year would bo an irrational pro- 
cedure Ihis would not accord with the rational idea of sc isonal 
V anation outlined above because the averages of raw data would 
include averages of somctbiug in addition to Hosonal vanation 
Problem of Isolating Seasonal Fanalion fo average the 
actual seasonal vanation, it must bo isolated from tho other 
types of vanation in the raw data Tho technical problem 
mv oh ed m the measurement of seasonal v anation is thus how to 
isolate from the raw data that part of its fluctuation that is 
e->sentiall} seasonal in character When thc-^c other typos of 
> If the Mafionol varuliou w^re incsu>urc<l weekly rather than nionthi), 
the pnuciplc would be the same Hie tiainc principle may bo used to 
measure periodicity by days wilbm the mouth or within tho week and it 
may likewise bo used to measure periodicity by hours within the day 
Thus penodicity by days withm the month of vap: payments might lia>c 
great economic value for some problems and penodicity by hours within 
the day of consumption of clccfncal povrer might have signiGcanco in coo 
iiectiou with some problems Seasonal variation is only one type of 
periodicity that can be ineaii imi bv this method 



TIME-SERIEU ANALYSIS — SEASONAL VARIATION 623 

fluctuation have been removed from the raw data, by subtraction 
or division, all the residual Januaries can be rationally averaged 
all the residual Februaries can be averaged, etc., iu order to 
obtain a picture of the average position, respectively, of each 
month. 

How can this be done? There are several answers to this 
question, and there is controversy as to just what is the best 
technical procedure. .In his notable studies of seasonal variation 
made about 1910, Prof. Kemmerer devised a method for measur- 
ing seasonal variation separate from other fluctuations. At the 
time, it was the best that had been suggested. ‘ 

Another famous suggestion as to a method of isolating the 
seasonal periodicities from other types of fluctuations was made 
by W. M. Persons, when from 1915 to 1919 he developed his 
approach to the problems of time-series analysis, culminating 
in the establishment of the Haiward Economic Society’s business 
barometer and the Review of Economic Statistics. Persons' 
method, called the “link relative method,” expresses each 
monthly figure as a relative of the immediately preceding month; 
the seasonal pattern is found by averaging all the link relatives 
for the same month and taking any residual trend out of the chain 
relatives computed from these average link relatives.- 

A third method of isolating the seasonal fluctuations and 
measuring them by an index of seasonal variation is that advocat- 
ing simply the removal of trend from the data and then the 
averaging of the monthly ratio differences from the trend."* 
While this method removes the nonhomogeneous effects of ti-end, 
it does not remove those due to cychcal fluctuations. If taken 
over a sufficient period of time, the bias of the cyclical fluctuations 
will cancel so that a true index of seasonal variations would be 

* 0/j. cit. Of. criticism of Kemmerer’s method by W. L. Hart, J oiu-nal of 
the American Statistical Association, Vol. 17 (1922). Kemmerer’.s work 
constitutes an important pioneer effort to solve the technical difficultie.s 
involved and helped direct attention to better solutions. 

^Review of Economic Statistics, .January, 1919, pp- 18-31; Indices of 
Business Conditions (1919). Cf. Rietz, H. L., Handbook of Mathematical 
Statistics, pp. 151-155. 

^Palkneb, Helen D., “The Measurement of Seasonal Variation,” 
Journal of the Ajmrican Statistical Association, Vol. 19 (1924), pp. 167--179; 
Robb, Richabd A,, “Variate Difference Method of Seasonal Variation, 
ibid., Vol. 24 (1929), pp. 250-257. 



624 


STUUY OF D\ Si me V ilil iHlLIl 1 


obtained by a\engmg One of tlie discoveiiess of rccuit jear^, 
however, is that seasonal lanations change m the same time 
senes from one era to anoUier owing to new conditions, for such 
tune senes an mde\ of seasonal sanation based on a period of 
time covenng two or more eras would be comparatively useless 
Thus the criticism of the latio difference-from the-trend method 
IS that, if taken over a sufficient period of time to make it a 
valid measurement of seasonal variation, it would bo taken for 
too long a time, t e , that two or more eias of typical seasonal 
fluctuation might be confused 

V numbei of other methods have been suggested, based 
upon the pnnciples that have been outlined ‘ The most mdelj 
used and probably the best method is the 12 months' moving 
average method, of which a number of lefincments have been 
suggested Since this method is the one most extensively used, 
it IS now described m detail and an illustration will be given 

Twelve Months' Moving Average Method This method 
consists of the following steps 

1 Calculate a 12 montlis’ moving aveiage of the raw data, 
centering the moving average at the seventh month, thus, 
opposite July of the fiist year would be the average of the 
12 months of that j'fear, opposite August would be the average 
of the last 11 months of that year and the first month of the next 
year, and so on 

2 Divide the raw data serially by the 12 months’ moving 
average Inasmuch as the moving aveiage would contain 
in it the elements both of trend and of major and minor cj cles, 
the lesiduals of the raw data fiom tlie moving average (eithei 
by subtraction or division) would contain purely seasonal 
fluctuations 

^King, tv I, "An Improved Method for Measuring the Seasonal lac 
tor. Journal of the American Slaltsiieal issoctalion, Voi 19 (1924) pp 
301-313, Cabmiciiaei., F L, ‘ Methods of Computing Seasonal Indexes 
Constant and Progressive,” ibtd , Vol 22(1927), pp 339-354 Jay, tEVNEs-., 
and Thomas Woodlief, “Use of Moving Averages in the Measurement of 
Seasonal Variation, ’ ibid \oI 23 (1928), pp 241— 2o2 Bacmaxn, A 0 , 
Thirteen Months Ratio First Difference Method of Measuring Seasonal 
Vanation,” ibid , Vol 23(1928), pp 282-290 Ruznets, Simon, "Seasonal 
Patterns and Seasonal Amphtudes Measurement of The r Short time 
Variations ’ ibid , Vol 27 (1932), pp 9-20 Risoleman, John R , and Ira 
V FniSBEE Business jSlalislic* (1932), pp 226-242 



time-series analysis— seasonal variation 025 

3. Make frequency distributions of the several months (see 
the example at the end of the chapter). 

4. Find the median relatives for each month, using the median 
to avoid the influence of extreme fluctuations. 

5. Express these median relatives as a percentage of their own 
average, thus giving an index of seasonal variation. 

As a short cut, inasmuch as the result will be precisely the 
same, the 12 months’ moving total may be used instead of the 
12 months’ moving average, thus saving the division throughout 
by 12. 

Problem lUustrating Measurement of Seasonal Variation. 
Calculating the Index of Seasonal Variation by the 12 Months’ 
Moving Average Method. The time series of monthly data on 
consumer installment-sale debt for household appliances in the 
United States has been selected to illustrate the calculation of an 
index of seasonal variation by the 12 months’ moving average 
method. Table 96 is a work sheet for the calculations necessary 
to the problem. The data were recorded on this work sheet for 
the years 1929-1942 by months, the raw data appearing in 
column (1). Ne.xt, a 12 months’ moving total was calculated; 
this appears in column (2), the moving total being “centered at 
the seventh month.” For example, the figure 2,930 after July, 
1929, in column (2) of the work sheet is the total of the 12 
monthly figures for 1929; the figure 2,972 (opposite August, 
1929) is the total of the ne.xt 12 monthl}'^ figures, beginning with 
Febi’uary, 1929, and ending mth January, 1930. Opposite each 
July is the total for that year; this constitutes a good cross 
check in the construction of the moving total. 

To calculate the moving total, first put the 12 monthly figures 
for 1929 in the adding machine, and take a subtotal; then sub- 
tract the datum for January, 1929, and add the datum for 
January, 1930, and take a subtotal; then subtract the datum for 
February, 1929, and add the datura for February, 1930, and 
take a subtotal; and so on, until the end of the time series. 
Clear the machine, and then add independent!}' the last 12 
months of the time series; this should check with your last 
subtotal. If it does not check, a mistake has been made, which 
can be most readily found by checking up on the Jul}'^ subtotals 
for each year, beginning with the last one and going back until 
you find the mistake. These subtotals are the 12 months’ 



G26 


SI UDr OF DYNAMIC VAHIABILITY 


Tabli. 96 — Work Sheet for CAiAniiJiTiNG Index of Seasonal Variation 
Data Consumer iRstallmcRt-sale debt, mootbly, for household appliances, 
end of month 
(In miUiona of dollars) 




(2) 

(3) 

MontUr 

nw data. 

12 montba 
moving total 
centered at 

7th month 

Itaw data divided 
by 12 months 
moving total 
per cent 

1929 




January 

207 



February 

199 



March 

199 



A.pnl 

217 



May 

237 



June 

260 



July 

273 

2,930 

9 32 

Vugust 

274 

2,972 

9 22 

September 

272 

3,006 

9 05 

October 

260 

3,031 

8 78 

November 

261 

3,043 

8 58 

December 

265 

3,043 

8 71 

1930 




January 

249 

3,031 

8 22 

February 

233 

3,010 

7 74 

March 

224 

2,984 

7 61 

April 

229 

2,953 

7 75 

May 

237 

2,919 

8 12 

June 

248 

2,881 

8 61 

July 

252 

2,838 

8 88 

August 

248 

2,800 

S 86 

September 

241 

2,766 

8 71 

October 

232 

2 734 

8 48 

November 

223 

2,701 

8 26 

December 

222 

2,666 

8 33 

1931 




January 

211 

2,628 

8 03 

February 

199 

2,588 

7 69 

March 

192 

2,548 

7 54 

April 

196 

2,509 

7 81 

May 

202 

2,471 

8 17 

June 

210 

2,434 

8 63 

July 

212 

2,397 

8 84 

August 

208 

2,357 

8 82 

September 

202 

2,318 

8 71 

October 

194 

2,275 

8 53 

No\ ember 

186 

2,225 

8 36 

December 

185 

2,167 

8 54 



TIME-SERIES ANALYSIS — SEASON' AL VARIATION (527 


Table 96. — Work Sheet for Calculating Index of Sb.vsonal Variv- 

TioN. — {Conlimwd) 



(1) 

(2) 

(3) 

Year and month 

Monthly 
raw data 

12 months' 
moving total 
centered at 

7th month 

Raw data di\ide(l 
by 12 months’ 
moving total, 
per cent 

1932: 




January 

171 

2,101 

8.14 

February 

160 

2,028 

7.89 

March 

149 

1,934 

7 63 

April 

146 

1,881 

7.76 

May 

144 

1,813 

7.94 

June 

144 

1,749 

8.23 

• July 

139 

1 ,685 

8.25 

August ■ 

134 

1,628 

8.23 

September 

129 

1,575 

8.19 

October 

126 

1,528 

8.25 

November 

122 

1,484 

8.22 

December 

121 

1,447 

8 36 

1933: 




January 

114 

1,418 

8 04 

February 

107 

1,398 

7.65 

March 

102 

1,386 

7.36 

April 

102 

1,378 

7.40 

May 

107 

1,372 

7.80 

June..^. 

115 

1,367 

8.41 

July 

. 119 

1,365 

8.72 

August 

122 

1,364 

8.94 

September 

121 

1,365 

8.86 

October 

120 

1,370 

8.76 

November 

117 

1,384 

8.45 

December 

119 

1,403 

8.48 

1934: 




January 

113 

1,422 

7.95 

February 

108 

1,441 

7.49 

March 

107 

1,456 

7.35 

April 

116 

1,468 

7.90 

Maj’^ 

126 

1,479 

8.52 

June 

134 

1,490 

8.99 

July 

138 

1,502 

9.19 

August 

137 

1,515 

9.04 

September 

133 

1,528 

8.70 

October 

131 

1,544 

8.48 

November 

128 

1,561 

8.20 

December 

131 

1,578 

8.30 


()28 SlUDi Ok m \ IW/C V\Hl llilim 


lABLE 96 — WoBh. Sheet fob Caeculatinc Index of Seasonal \ahia 
TioN — {Conltn el) 



(i> 

(21 1 

C5) 

»nd noath 

Mon hly 
raw data 

12 n outK» ^ 

7th month 

Ilg.^ data d d«d 
t > I’’ tno ths 
o ng total. 

1935 

January 

126 

1 600 

7 88 

February 

121 

1 626 

7 44 

March 

123 

1 657 

7 42 

April 

133 

1 692 

7 86 

May 

143 

1 728 

8 28 

June 

156 

! l 76S 

8 82 

July 

164 

1 SOS 

9 07 

August 

168 

1 815 

9 10 

September 

168 

1 SS2 

8 93 

October 

167 

1 921 

8 69 

November 

168 

1 964 

S 55 

December 

171 

2 017 

8 48 

1936 




January 

103 

2 075 

7 86 

February , 

158 

2 143 1 

7 37 

March ^ 

102 

2 211 ' 

7 33 

April 

176 

2 281 

7 72 

May 

196 

2 353 

8 33 

June 

214 

2 428 

8 81 

July • 

232 

2 512 1 

0 24 

August ' 

236 

2 596 

9 99 

September 

238 

2 683 

8 87 

October 

239 

2 772 

8 62 

November 

213 

2 863 

8 49 

December 

2aj 

2 9o2 

8 64 

1937 




January 

247 

3 04j 

8 11 

February 

24o 

3 130 

7 83 

March 

251 

3 217 

7 80 

April 

267 

3 302 

8 09 

May 

285 

3 382 

8 43 

June 

307 

3 451 

8 90 

July 

317 1 

3 503 

9 Oo 

August 

323 1 

3 552 

9 09 

September 

323 

3 594 

s gg 

October 

319 

3 624 

S 80 

November , 

312 

3 639 

8 57 

December I 

307 ' 

3 636 

S 44 



TIME-SEltlES ANALYSIS— bE AEON AL VARIATION 629 


1 96. WoBK. Sheet for Cvlculaiing Index of Seasonal V\biv- 

TiON. — {Continued) 


Year and month 

(1) 

(2) 

(3) 

MonthU 
T'iw data 

12 months’ 
moving total 
centered at 

7th month 

Raw data div ided 
bj 12 months* 
moving total, 
per cent 

1938: 




January 

296 

3,610 

8 20 

February 

287 

3,571 

8.04 

March 

j 281 

3,525 

7 97 

April 

282 

3,475 

8 12 

May 

i 282 

3,420 

8 24 

June 

28L 

3,371 

8 34 

July 

278 

3,330 

8 35 

August 

277 

3,290 

8 42 

September 

273 

3,253 

8 39 

October 

264 

3,218 

8 20 

Movember 

263 

3,183 

8.26 

December 

266 

3,154 

S 43 

1939: 


1 


January 

256 

3,153 

8 12 

February 

250 

3,120 

8 01 

March 

246 

3,110 

7 91 

April 

1 247 

! 3,103 

7 96 

Maj^ 

j 253 

3,104 

8 15 

June ' 

1 260 

3,106 

8 37 

July 

265 

3,113 

8 51 

August 

267 

3,119 

8 5b 

September 

266 

3,124 

8 51 

Octobei 

265 

3,131 

8 46 

November 

265 

3,143 

8 43 

December 

273 

3,161 

8 64 

1940: 




Januarj^ 

262 

3,182 

8 23 

February 

255 . 

3,205 

7 96 

March 

253 

3,232 

7 83 

April 

259 

3,2.59 

7 95 

May. . 

271 

3,284 

8 25 

June 

281 

3,309 

8 49 

July 

288 

3,338 

8 63 

August 

294 

3,366 

8 73 

September 

293 

3,397 

8 62 

October 

290 

3,430 

8 45 

November 

290 

3,474 

8 35 

December 

302 

3,523 

8 57 


630 STUDY Of Di V 1 V/C I IRI iniLlTl 


Table 96— ^^onK Sheet i-or CalcOI-atinq Index o> Sea^onaE Vakia* 
Tiov — {CotUinutd) 


Vciir* dnonlh j 

(1) 

(2) 

(3) 

mr dkla 

1 inontlia 

M U>B to aV 
centered et 
h monlb 

Kaw data di ded 

1 b) 12 montba 

1 n ovmB total 

1941 




January 

290 

3 572 

S 12 

lebruary 

286 

3 620 

7 DO 

March 

28G 

3 672 

7 79 

April 

303 

3 721 

S 14 

May 

320 

3 764 

8 50 

June 

330 

3 794 1 

8 70 

July 

336 

3 SOd 

8 83 

August 

346 

3 SOD 

9 OS 

September 

Zi2 

3 SOS 

S 93 

October 

333 

3 794 ' 

8 78 

November 

320 

3 749 

$ d4 

December 

313 

3 670 

S d3 

1942 




January 

294 

3 5a9 

8 26 

February 

2S5 , 

3 425 

S 32 

March 

272 

3 262 

S 34 

tpnl 

258 1 

3 OSD 

8 3d 

Nlay 

241 



June 

219 



July 

202 



August 

1S3 



September 

169 



October 




Xovember 




December 





Source UoLtbauhsh Ddncam AIcG Montblr Eat mates ol Sbort-teriu Cons 
Debt 1929-1912 Surrev of Current Bitnieat \c4 22 (No ember 1942} ]>p 9-21 


moving total and can be tabulat(}d in column (2) of the ttork 
sheet, as in Table 96 The next step is to divide each monthlj 
raw datura by the corresponding moving total figure, expressing 
the ansuer as a percentage figure m column (3) The figures in 
column (3) are then tabulated m a sjstem of frequency arrajs 
as in Fig 147 

From Fig 147 the median monthly relatives are read and 
arranged as in Table 97 



TIME-SERIES ANALYSIS— SEASONAL VARIATION 


031 



Fio. 147. — Frequency airays, one for each month, of distributions of monthly 
ratios of raw data to 12 months’ moving total. Column (3) of Table 96, Con- 
sumer instaUment-sale debt for household appliances in the United States, 
1935-1942. 

Column (1) of Table 97 consists of the median relatives read 
from Fig. 147; and these median relatives have only to be 

Table 97. — Index of Sbason.al Variation in Consumer Installment- 
sale Debt for Household Appliances in the United St.\tes 


Month 

Medianii 

Index of seasonal 
variation* 

Januarj' 

8.13 

96.2 

February 

7.95 

91.0 

March 

7 82 

92.5 

April. . 

8.05 

95.2 

May. 

8.25 

97 6 

June. 

8.72 

103 2 

July 

8.85 

101.7 

August 

9.10 

107.6 

September 

8.88 

105.0 

October 

8.6-1 

102.2 

November 

8.50 

100.6 

December 

8.55 

101.1 

Total 

101.41 

1,200.0 

.'Werage . , 

8.4533-1- 



1 This column consists of the medians expressed as percentages of their average. Thus 
8.13 is 96-2 per cent of 8.4533+- 

expressed as percentages of their own average to give the index 
of seasonal variation. Tliis is done, giving the figures in column 





Jon [<& Utr Ax l^JuncU^AuiS^tOcOaiCec ftbftor «(i MjIurtU^ hs) jcpi 0:^ Hn Dec 

7 HS '^tudlc■ III r«j>on^l \BriatiuUi( in fo UM cmol I u{>er rutci^ Lirforp an 1 
since U 0 establisUn <nt oi I) « l-cdcral itc cn c Hjslcm 


ttous and chance liucluation** lij averaging the chance 
fluctuations arc canceled out, leaving m tho indc\ a dcAcnption 
of lelative fcca»onaI movement The theorj of this method is 
ho-sHl of course, upon the H'e of a 12 montlis' moving ivengc, 
but prccisels the same anthmetical results arc obtained b> U'^ing 
the moving total instead anditwasiv mgof a tonsidcr iblcniim 




TIME-SJUtlliS AN AJA'SIS— SEASONAL VARIATION 633 


her of division processes. In dividing by the 12 months’ moving 
total instead of by the 12 months’ moving average, the average 
residual percentage is 8.333+, whereas, in dividing by the 
12 montlis’ moving average, the average residual percentage 
would be 12 times 8.333+, or 100.00. 

From the multiple frequency array, as in Figs. 147 and 148, it 
can be determined whether or not the seasonal variation is well 
defined. If the course of all the recorded ratios of raw data to 
the 12 months’ moving total by months tends to run close to the 
course of the medians, then the seasonal variation is a well- 
defined one. If, however, the i3oints are scattered in a ^Yide 
range from the medians and the general swing of the data does 
not correspond to 'the movements of the median line, then the 
seasonal variation is not well defined. Such a result might be 
obtained if the tj’^pe of the seasonal variation were changing, and 
in that case the data may be studied in groups of a smaller num- 
ber of yeai-s. Figure 148 is included to present examples of 
poorly defined seasonal variation, as compared with well-defined 
cases of seasonal variation. The data studied are commercial 
paper rates in the New York money market before and after 
the inception of the Federal Reserve System. From the figure 
it is seen how well defined the seasonal variation in commercial 
paper rates was before the beginning of the Federal Reserve 
System — namely, for the periods 1904r-1909 and 1909-1914. 
Also, it is seen how poorly defined is the seasonal variation for 
the periods 1920-1925 and 1925-1930 — so poorly that there 
could hardly be said to have been any consistent seasonal 
periodicity whatever. ‘ 

METHOD OF DETECTING CHANGING SEASONAL VARIATION 

Figure 149 is drawn to discover Avhether or not, during the 
years from 1929 to 1941, the seasonal variation in consumer 
installment debt for household appliances has changed.® The 

1 For a more complete discussion see The New York Money Market, Vol. 
4, pp. 510-530. 

“ For other suggested methods of measuring changing seasonal variation 
see Julius Shiskin, “A New Multiplicative Seasonal Index,” Journal of the 
American Statistical Association, Vol. 37 (1942), pp. 507-516; Henry A. 
Latand, “Seasonal Factors Determined by Difference from Average of 
Adjacent Months,” Journal of the American Statistical Association, Vol. 
37 (1942), pp. 517-522; Dudley J. Cowden, “Moving Seasonal Indexes,”' 
Journal of the American Statistical Association, Vol. 37 (1942), pp. 523-524. 



STUDY OF DYWIMIV VAiaMUUTY 


loop 

JANUARY FEBRUARY 














WG StUDY OF OrVAV/C VAHIABIL17Y 

figure 1 * a plot of the ratios to 12 months’ moving total shown 
iirtho last column of Table 96, a separate graph for each month 
has been drawn in Fig 149 

The straight-hnc trends for each month were diawn m “at 
sight”, thej were not fitted b} a mathematical method Figmc 
149 shows that the reKtne seasonal jiosition of Januarj’’, June, 
October, November, and December icmained about the same 
during this period of jeara But the iclative seasonal amount of 
consumer installment debt was using m the months of February, 
March, Apnl, and May, while the relative seasonal quantity of 
consumer installment debt was declining in the months of 
Julj , August, and September 

Consequently, a more lefincd «ide\ of seasonal vanation 
than the average of a period of yeais such as that shown la 
Table 97 and Fig 148 can be obtained from Fig 149 In fact, 
since these trends exist, a dtlTeieot inde'c ot seasonal variation 
for each year is required For 1942 this index of seasonal v aria- 
tion can be obtained as indicated in Table 98 

TaBLL 98 — rCoMPCTATION OP IVDLX OP SEASONAL V ARIATIOV IN CONSVULR 
lN8TAUUBNT*SAt.l DeBT FOR HOLSLIIOLU trPLlANCLS 1942 


Vfo >(b 

Rntioe rrci 
tiom itend linen 
in Tic 140 

Index of 
eeaeonal 
lanationi 

Januarj 

8 30 , 

96 4 

I ebruary 

8 09 

96 2 

Maicb 

8 OS ' 

96 1 

tpnl 

8 IS 1 

97 0 

May 

8 3S 1 

99 3 

June 

8 GO ! 

102 3 

Julj 

8 6o 

102 9 

tugust 

8 75 1 

104 1 

September 

8 63 

102 9 

October 

3 S4 1 

101 6 

November 

8 40 1 

99 9 

December 

8 50 ' 

101 1 

Total 

too 86 ' 

1,200 0 

tv erage 

8 405 



» OUtAitvtd bj cx5jr«»fcin6 r it'o* in tbe firrt c^oRi i e* percentage* of their own average 




CHAPTER XXIV 
DETERMINATION OF CYCLE 


Usually it is desirable to have current figures on a monthly 
basis, and to know how actual experience compares with what 
should be expected for the season and with normal growth. 
Can we estimate our position in the business cycle from month 
to month? Annual data adjusted for trend and a picture of 
undistorted seasonal variation, illustrated in the preceding 
chapters, do not go quite far enough. It is often necessary 
to remove trend and seasonal variation from monthly data in 
order to determine position in the cycle. 

Cycle Determined hy Adjusting Monthly Data. When monthly, 
instead of annual, data are analyzed, the empirical trend may be 
found by setting up a work sheet similar to Table 83 or 86 (pages 
582, 588), depending upon the type of trend selected. The 
trend is then fitted by the method of least squares in a manner 
precisely similar to that demonstrated for annual data. Of 
course, if quite a number of years of monthly data are thus 
treated, the calculations become very extended, but the principle 
remains the same.^ It is possible, however, to derive an approxi- 
mation of the monthly trend equation from an annual trend 
equation. This is explained in the present chapter and may 
serve as an economizer of time in the analysis of monthly data. 

Determination of Cycle in Annual Data. While the purpose 
of this chapter is to present a method for measuring the cycle 
in monthly data, it may be noted at the start that even if the 
object of anal 3 '^sis is to determine cycle in monthly data it is 
desirable first to study the annual data. Not only is this true 
because the monthly trend may be easily estimated from the 
annual trend, but it is also desirable because the general character 

^ Where trends are calculated for monthly data, involving long series, 
convenient short-cut methods of calculation have been devised. Cf. Ross, 
P. A., “Formulae for Facilitating Computations in Time Series Analysis,” 
Journal of the American Statistical Association, Vol. 20 (1925), pp. 75-79; cf. 
also timesaving devices discussed in Chap. XXII. 

637 



038 STUDY OF D\ ViiV/C VARIABILITY 

of the results on a monthly baais can be visualized from the 
analysis of the annual figures and the annual data can be analyzed 
iMth a much smaller amount of computation The analysis 
on an annual basis uill help judge the kind of analj sis required 
for the monthly data, whether to use a straighb-hne trend or a 
second- or third-degree polynomial trend In addition, the 
analysis on an annual basis will help to decide the significance 
of the respective trends This will now be illustrated by making 
use of monthlj and annual averages of monthly data on consumei 
installment sale debt for household appliances m the United 
States, 1929-1942 * 

The work sheet is not reproduced here, but one similar to 
Table 91 (page 594) was constructed and the following set of 
subtotals was obtained Si =» 2,842, Sj » 18,159, S» = 89,614, 
and Si * 362,901 (m millions of dollars) Using the method 
of orthogonal polynomials, which is the quickest method of 
fi-uding at once the first , second , and third degree polynomial 
trend lines by one set of calculations, the folloiving results 
wero obtained by the application of Eqs (5) and (8) to (10), 
Chap XXII (pages 005-612) 

a = = 218 0163S 

f “ — 91— = >98 64946 

J = = 199 39615 

The values of the denominators in the above fractions were 
obtained from Table 93 (page 013) These calculations are 
earned to more places than are significant for the problem because 
they must be combined in multiple proportions to obtain the 
following results by using Eqs (9) and (10) , Chap XXII 
(pages. 611-612) 

•IIoLTHAOSBN, D 0 ^CA^ McG, ‘Monthly Estimates of Short-term 
Consumer Debt Suney of Current Business Vol 22 (Novcm 

her 1942), pp &-25, 17 



DETERMINATION OF CYCLE 


639 


«' = 218.61538 A = 218.61538 

ff = 19.06593 B = 9.53296 

y = 13.87471 C = 3.15334 

5' = -6.12367 D = -0.64948 

Accordingly, the following set of orthogonal-polynomial 
trends is obtained; the first line is the straight-line trend; the 
first line combined wth the second line is the second-degree 
polynomial trend; the first, second, and third lines combined 
give the third-degree polynomial trend: 

y' = 218.61538 + 9.53296pi 
+ 3.15334p2 
— 0 . 64948^3 

that is to say, since pi - t, p 2 = t- — 14, and p^ = ~ 2U 

when iV = 13 (see pages 605 and 615), 

y' = 218.61538 + 9.53296i 

+ 3.15334(1- - 14) 

- 0.64948(1® - 251) 

The three possible trends are therefore the following (in 
millions of dollars) : 

Straight-line trend: 

y' = 218.6 -f- 9.5331 (origin at 1935) 

Second-degree polynomial trend: 

i/" = 174.5 + 9.5331 + 3.1531® (origin at 1935) 

Third-degree polynomial trend: 
y”' = 174.5 + 25.77 1 + 3.1531® - 0.64951® (origin at 1935) 

Table 99 is presented to show the raw annual data and the 
annual values of each of these three trends, which are also 
presented graphically in Fig. 150. 

An annual increment averaging something less than 10 on a 
base of over 200 is not too great to deter the assumption that the 
straight-line trend roughly depicts rational long-term gro^vth. 
If it is appropriate to suppose that installment-sale debt would 
tend to grow at the rate of population gro\vth, which is pre- 
sumably geometric, the trend line fitted should be a logarithmic 



040 


bJVD^ Oh DWiMIV \ \Hl Mill ll\ 


1ABL> 99 — raENU \>.ALYSU> OF CONfeUMEa INSTALLMENT BM V 

Deht Fon IIousjHOLU Vipuancfs IK nib Lmtlij States 192^1942 
(III luiUiona of doIlArs) 


T »f 

Kaw Ulu 


KeL<> J cjrgr(« 

I l>n^isl 

Ihrclslcgic 

1 lyi h1 

1929 

244 

161 

231 

274 

1930 

23G 

171 

206 

206 

1931 

200 

180 

187 

163 

1932 

140 

100 

174 

U3 

1933 

114 

200 

168 

141 

1934 

125 

209 

168 

152 

l93o 

Idl 

219 

174 

174 

193G 

209 

228 

187 

203 

1937 

292 

238 

200 

233 

1933 

277 

247 

232 

263 

1939 

259 

257 

2G3 

280 

1940 

278 

206 

301 

301 

1941 

317 

276 

345 

302 

1942 


280t 

390 1 

287 1 










determinatiox of cycle 


611 


trend; but for a short period of j^ears the straight-line trend is an 
adequate approximation of the more appropriate logarithmic* 
trend. The straight-line trend in the data presently studied may 
consequently be used as the base from which the major cycle in 
the data can be measured. 

The second-degree polynomial tiend shows a sharp rise for the 
later years, and if extrapolated beyond 1942 it would quickly 
approach infinity; it is not, therefore, a reasonable picture of 
rational growth. The straight line comes nearer to what would 
be the result if a logarithmic trend were fitted, if data covering 
a long enough period were available to afford sufficient per- 
spective to obtain a growth curve. 


Txblk 100. — Cyclica-C MovhMExr^ ix Consumer IxsTc.i.i.MExr-sc.i.B 
Debt iqu Household -Vppliaxces in the United States, 1929-1942 


Year 

Raw data, 
millions of 
dollars 
y 

i 

Straight-line 
trend, millions 
of dollars 

v' 

Second-degree 
poljnomial 
trend, millions 
of dollars 

1 

C>cle, 
per cent 

y 

- :oo) 

C\cle mixed 
with residuals, 
per cent 

R. 

(r/' = 100) 

1929 

244 

161 

274 

170 

152 

1930 

236 

171 

206 

120 

138 

1931 

200 

180 

163 

90 

111 

1932 

140 

190 

143 

75 

74 

1933 

114 

200 

141 

70 

57 

1934 

12.5 

209 

152 

73 

60 

1935 

151 

219 

174 

79 

69 

1936 

209 

228 

203 

89 

92 

1937 

292 

238 

233 

98 

123 

1938 

277 

247 

263 

106 

112 

1939 

259 

257 

286 

111 

101 

1940 

278 

266 

301 

113 

104 

1941 

317 

276 

302 

109 

114 


The third-degree polynomial trend seems adequatelj^ to 
represent the rounded contour of a major cycle. Examination 
of Fig. 150, accordingly, leads to the conclusion that the straight- 
line trend can be used to depict growth in the data, and the 
third-degree polynomial trend can be used to measure the major 
cycle. As a consequence, the raw data divided by the straight- 
line trend should give a picture of the major cyclical movement 








042 STUDl OF D\ \ IVIC VUilADlUT} 

in this penod (plus the residual fluctuations)/ and the tlurd- 
degrec polynomial trend divided the straight-hnc trend gi\c 3 
a measure of the major c^clc Table 100 and Fig 151 gi^e the 
results of sucb computations The column headed Cjcle uv 
Table 100 consists of the second dcgieo polynomial empirical 
trend divided by the straight line empirical trend, giving as a 
lesult a smoothed measure of the mayor cycle This is shown 
by tho heavy line m Fig 151 The column headed Cycle mixed 



iW (932 1934 1936 H36 1940 


lio 151 — Cyclical tiud^ o( coosuener inUaUtuanUaalo debt tor Kouschold 
appliances lu the Uuted States 102!>-1942 

With icsiduals in Table 100 consists of the raw data divided by 
the straight-lme empirical trend, giving as a result the measure 
of the cy cle mixed with residual fluctuations in annual data ‘ 
Both these columns are expressed as percentages, with the y' 
for each year equal to 100 

Determination of Cyclg m Monthly Data. Cycle Dclcrmtned 
by Adjusting Monthly Data Monthly data, when examined to 
discover the cycle, must be adjusted not only for trend but also 
foe aesA^Kval vanatvous AdyuaUus wwuthly data for tread 
and seasonal variation in order to measure cyclical movements 

^ In addition, there mif^ht he minor cyclical mov ement, a fact that could 
bo determined by further analysis of data extending over a longer period of 
time 

*Soe p 570 for meaning of ' residual fluctuations” m time senes In 
this instance, the residuals might mcludo short-cycle fluctuations Sco 
also Chap XXV, pp 659-C61 



DETERMINATION OF CYCLE 


643 


Tabi-e 101. Work Sheet fob Calcoe.ating MoxthI/V Ixdex of Cycle 
Description of Data: Consumer installment-sale debt for purchase of 
household appliances, United States 

Source op Data: Survey of Current Business, Vol. 22 (1942), pp. 9-25, 17. 


(1) 

(2) 

(3) 

(4) 

(5) 

(6) 

Year and mouth 

Raw 

data, 

DilUiona 

of 

dollar^! 

y 

Monthly 

trend, 

rnilHons 

of 

dollars 

Index of 
seasonal 

variation.t 

per cent 
Av. = 100 

Monthly 
trend times 
index of 
S.V., millions 
of dollars 
y' X S.V. 

Cycle, 
per cent 
y 

V' X S.V. 

1910: 






January 

2G2 

262.0 

90.2 

252.0 

104.0 

h'ebruary 

255 

262.8 

94.0 

247.0 


March- 

253 

263.6 

92.5 

243.8 

103.8 

April 

259 

264.4 

95.2 

251 . 7 


May 

271 

205.2 

97.6 

258.8 

10-1.7 

June 

281 

266.0 

103.2 

274.5 

102.4 

July 

288 

206.8 

104.7 

279.3 

103.1 

August 

294 

207.6 

107.0 

287.9 

102.1 

September 

293 

268.4 

105.0 

281.8 

104.0 

October 

290 

209.1 

102.2 

275.0 

105,4 

November 

290 

269.9 

100.6 

271.5 

100.8 

December 

302 

270.7 

101. 1 

273.7 

110.3 

1941: 







290 

271.0 


361,2 



2S6 

272.3 



■ 


280 

273. 1 


252.6 

■ H 


303 

273.9 


260.8 



320 

274.7 


208.1 



330 

275.0 


284.3 

■ ifl 


336 

276.3 


289.3 

110.1 


34Q 

277.1 


298. 2 

116.0 


342 

277.9 


291.8 

117.2 


333 

278.7 


284.3 



320 

279.5 


281.2 



313 

280.3 


283.4 


1942: . 







294 

281.1 


270.4 

108.7 


285 

281.9 


205.0 



272 

282.7 


261.5 



258 

283. 5 


269.0 

95.6 


241 

284.3 


277.0 

80.8 


219 

285.1 


294.2 

74.4 

July 

202 

285.9 


299.3 

67.5 


183 

286.7 


308.5 

59.3 


169 

287.4 


301.8 


October 






November 






December 







* Equation of monthly trend: y' =* 219.0 d" 0,7961 (origin July, 1935). 
t hfecessary to copy this for only ono year; this seasonal variation was calculated for 
illiLstmtive purjjoses Chap. XXIII, pp. 625-^31, 














DETERMINATION OF CYCLE 


G15 




But if the annual figures are annual averages of nrontlil.v data, 
then the monthly equation of trend \sill be 


!/' = « + 


12 


t 


Thus, h must be divided by 12 because in the monthly trend 
equation the annual increment is distributed among 12 parts 
{t now stands for months instead of years). In other words, if h 
is the annual increment, b/l2 is the monthly increment. But if 
b is the annual increment of total annual data (sum of the 12 
months each year), then to put it on a monthl}' basis it is neces- 
sary first to convert it to a monthly figure by dividing by 12; 
it is then still an annual increment and has to be divided by 12 
again to obtain the monthly increment. 

In the trend equation, a can be assumed to be at June-July of 
the origin year, and of course the origin may be shifted by changing 
accordingly the value of a. The origin is at the middle of the 
year, i.e., between June and July. For example, the equation 
of trend found for the annual data on consumer mstallment-sale 
debt for household appliances is (in millions of dollars) 

y' ~ 218.6 -f- 9.53t (origin at 1935) 

The data used in this illustration are annual averages of monthly 
data; so the monthly trend equation is (in millions of dollar.s) 

rj' — 218.6 -f 0.796i (origin at June-July, 1935) 
in which unit of i is 1 month. 

By adding algebraical!}’' half a monthly increment to 218.6, 
the origin is shifted to July, 1935, and the approximate ecpiation 
of monthly trend is as follows (in millions of dollars) : 

y' = 219.0 + 0.796< (origin at .July, 1935) 


Solving this equation for different values of I (from i — 54 at 
January, 1940, to 1 = 86 at September, 1942) gives the various 
monthly values of y' shown in column (3) of Table 101 under the 
caption Monthly Trend. Column (4) show.s the index of sea- 



646 STUDY OP DYNAMIC VARIABILITY 

sonal vanation, whicli m Chap XXIII was calculated by the 
12 months' moving total method Column (5) shows seasonal 
vanation and trend combined by multiplication, and column (6) 
18 obtamed by dividing the monthly items m column (2), the 
raw data, by the monthly items m column (5) By this last 
operation, both trend and seasonal variation are removed from 
the raw data, the resulting index gives an idea of how high or low 
the raw data ate in comparison to what they might be expected 
to be according to usual seasonal vanation and trend 

Data thus treated over a senes of years disclose information 
about the time senes that it is not possible to visualize from the 
raw figures It makes possible the compan&on between cyclical 
and minor cyclical fluctuations in time senes otherwise con 
cealed by disturbing elements of seasonal vanation and trends 
If this monthly analysis of the data on consumer mstallment-salc 
debt for household appliances in the United States ^\ero done 
for the entire penod 1929-1942, the picture of monthly data 
would, of course, resemble the broken Ime of Fig 151 The 
annual averages of the monthly data, which contain cyclical 
movements mixed with residual movements, would be equal to 
the flgurcs shoivn in the flna! column of Table 100 In this con- 
nection It IS to be noted that the annual averages of column (6) 
in Table 101 are equal to the corresponding annual figures m the 
last column of Table 100, for 1940 the annual average of the 
figures m column (0) of Table 101 is equal to 104, and for 1941 
the annual average of the figures in column (6) of Table 101 
IS equal to 114 

From the results of calculations in Table 101 it may be con- 
cluded that consumer installraent-sale debt for household 
appliances reached the peak of a cycle in May, 1941, remaining 
10 to 17 per cent above normal throughout 1941 In 1942 a 
sharp decline materialized, in fact, this decline, on a monthly 
basis, was rapid after October, 1941 The raw data appear to 
indicate that the peak of the cycle occurred in August, but this is 
due to the effect of seasonal vanation and trend \Vhen sea- 
sonal variation and trend are taken into consideration, the 
cychcal peak is found to be in May, 1942 From July, 1940, to 
Augi^t, 1940, the raw data show an increase, but a cychcal 
dechne occurred m that penod Removal of seasonal vanation 
and trend makes it apparent that the appearance of a nse from 



DETERMINATION OF CYCLE 647 

July, 1940, to August, 1940, was due to seasonal influences and 
trend. 

Adjustment of data by removing seasonal variation and trend 
makes it possible to judge quickly whether or not consumer 
installment-sale debt for household apphances is rising (or falling) 
more rapidly than seasonal variation and trend would lead us to 
expect. The resulting figures are frequently described when 
published by saying that the data are “adjusted for seasonal 
variation and trend.” Sometimes, if trend is unimportant or 
of dubious character, only seasonal variation is removed and the 
data are described as “adjusted for seasonal variation.” Charts 
of such data appear frequently in financial publications and in 
the financial sections of metropolitan newspapers. 

Measuring the Cycle Where Trend Is a Second- or Third-degree 
Polynomial. The rational growth of some data is better described 
by a second-degree pol 3 momial, as discovered in Chap. XXL 
When such is the case, it would be necessary to use a second- 
degree polynomial instead of a straight-line trend as illustrated 
in the preceding sections of this chapter. 

A third-degree polynomial is not likely ever to resemble a 
rational groudh element in a time series, but it may resemble the 
conformation of the major cycle during a specified period covered 
by data that are being analyzed. If it is desired not only to 
remove growth trend but also to remove from the data the effects 
of the major cycle in order to obseiwe residuals that might be 
significant^ described as a short cycle, the method described in 
the preceding sections could be used uith monthl)’- data, appl}^- 
ing the same principles to the removal of a third-degi’ee poly- 
nomial trend combined ^rith seasonal variation that were applied 
to the removal of a straight-line trend combined mth seasonal 
variation. 

The general form of the second-degree polynomial annual 
trend is y’ — a bt ct" where i is 1 yeav. The equation of the 
monthly second-degree poljTiomial trend, where the annual 
data are annual averages of monthly data, would be ■ 

y =« + i2^ + l44^ 


where t is 1 month. 



048 


hlUDY Ot DYNAMIC \ UilABlLlTY 


The general form of the third degree iwljnomial annual trend 
IS y' = o + + ct* + dt* where t is I jear The equation of 

the monthJj third-degree poljiionual trend, uhere the annual 
data are annual averages of monthly data, would be 


in which 1 13 1 month 

If the data are annual totals, instead of annual averages, 
every item on the nght side of the respective equations mil be 
divided by 12 • 

Danger tn Extrapolating Trends In the illustration on page 
641 it was noted that the second-degree polynomial trend, if 
extended beyond the year 1941, would quickly go up to infimty 
Thus the extrapolation, or extension of this trend beyond 1942 
very soon becomes an absurdity Tins shows the need for 
caution in the projection, or extrapolation, of empirical trends ‘ 
rheir projection for short penods of lime (how bng depends 
upon the conditions of each particular case) is a v aluable aid in 
constructing barometne indexes 

A troublesome unsolved problem m time>scnc:> analysis is to 
know when trend is changing and also, for that matter, when 
seasonal variation may be changing Neither the statisticians 
nor the economists have solved this problem, but they realise 
that it !■» ever present in tiroo-senes analysis It is desirable, 
therefoi-e, to be cautious about extending cmpincal trends into 
the futuie and to leexamine monthly data for seasonal vanatioii 
at frequent intervals \ method for detecting changing trend-> 
jn seasonal vanation was explained and illustrated in the pre 
cedmg chapter 

Method of Ratios is Method of Differences In general, the 
method here presented for remov mg one or more ty pes of vana- 
tion from time senes has been the method of diVTsion, or ratios 
In other w ords, the raw data are expressed as percentages of 
computed trend and seasonal variation This is not the only 
method of removing trend and seasonal variation from the 
monthly or annual raw data Vnotber type of approach is called 

* See pp &44 645. 

* For further discussion in ronnection with economic foreca-^tinn sec 
4 hap WV, pp 66i 671 



DETERMINATION OF CYCLE 


G4'J 

the “method of differences,” which, in one of a number of forms 
may be summarized as follows: 

1. Assuming that the index of seasonal variation and the 
monthly trend have been calculated by one of the conventional 
procedures, the monthly trend values are multiplied by the 
index of seasonal variation. 

2. The trend multiplied by seasonal variation (yt) are now 
subtracted from the raw data (y,). 

3. This gives a series of - y[, y„ _ ^ the 

arithmetical amount, in original units (pounds, dollars, etc.), 
by which the raw data are greater each month, or less, than the 
computed value for trend multiplied by the index of seasonal 
variation. These residuals of the raw data from trend and 
seasonal variation form a series ri, r^, Vs, . . . , i\ that vmuld be 
a time series fluctuating arithmetical^ above and below zero, 
according to whether the raw data were above or below trend 
multiplied by seasonal variation, that is to say, according to 
whether the raw data were greater or less than values expected 
in view of the anticipated growth and seasonal fluctuation. 

A These residuals are in terms of the quantity units of the 
raw data. Consequently, it would be very difficult to compare 
the residual fluctuations of a series measured in bushels (say 
wheat production) with the residuals in dollars (say the price of 
wheat). It is necessary to find a common denominator in order 
to compare the residuals in various time series, obtained by the 
arithmetical difference method. 

5. The common denominator used is the standard de^^ation 
in the residuals. Cr will be simply ■\/'Zr-/N, since their arith- 
metical average is zero. Each r divided successive!}^ by u ould 
give a series in terms of standard-deviation units that could 
thereafter be compared with other time series similarly treated. 
Various series, whether the original urrits were dollars, pound.'', 
inches, etc., will now be reduced to terms of standard-deviation 
units and can all be plotted on the same scale, namely, a scale 
that is calibrated in standard deviations. 

One important disadvantage in the method of differences 
persists even after the residuals are expressed in terms of their 
own standard deviations. The residuals will tend to be arith- 
metically greater when trend is at high values and arithmetically 
small when trend is at low values. This means that the impre.s- 



050 'iTVDY 0/ J>J VitWiC \ \Hl \BlUT'i 

hioii niU almost um%cr8ail> aru>c from such an analysis that as 
thmgH grow they become subject to more violent Huctuitions 
In fact, v\ben considered rcUtivc to the inagiutudcs from which 
vanation occurs, tlicse greater anthraclical vanations maj 
really less important The ntio method pluc*! at all tunes tiio 
proper proi>ortional emphasis u|>on antlimctical lluctuatioiLs b> 
expressing them as a ratio to the trend and seasonal v anation 
On the other hand the same token the ratio method may 
tend to minimize the importance of Huctuitions, for it may he 
possible that the proportional amount of chiiigc is not vo sig 
nificint as the actual amount of change For example, the fact 
that the amount of unemployment is no greater proportionilly 
may not necessarily dis])Obo of the fact that the actual amount of 
unemplojmcnt at some particular time is very grevt and the 
corresponding personal problems distrcbsiiig m the extreme 
Whatever method of statistics is used it is ncctssarj for tlic 
analjfet to keep his ejes open to the effect the method itself ma> 
hav 0 upon hts results 



PART VI 
Forecasting 

CHAPTER 

THE ART OF FORECASTING WITH STATISTICS 
INTRODUCTION 

Prevalence of Forecasts, Ancient Origin of Pseudoscientific 
Forecasts. The liuraan desire to look into the future led, even 
in ancient times, to the rise of various forms of pseudoscieutific 
forecasts. Oracles were ircquently consulted as to the outcome 
of a contemplated military campaign, business venture, or love 
affair. Among the most famous of these was the Delphian 
oracle. Astrologists were, and still are, cojrsulted for what the 
stars have to say; one of their most prominent devotees in 
modern times is said to be Adolf Hitler. 

It was partly to disprove some of these astrological notions 
that statistical method was first undertaken on a scientific 
basis. In the seventeenth century an idea prevailed that the 
phases of the moon influenced health; also, health was .supposed 
to be critical every seventh year and life particularly hazardous 
at the ages of forty-nine and si-xty-three. Near the end of the 
seventeenth century studies of vital statistics by Capt. John 
Graunt of London and Ca.sper Xeumann of Germany disproved 
the connection between health and the phases of the moon as 
well as the fateful significance of every seventh year in life. 
Other similar superstitiorrs were “debunked" by statistical 
studies. From the beginning of the history of the modern 
money market, attempts have been made to devise some way 
to forecast the course of financial affairs. For the xintwerp 
Bourse in 1543, Christopher Kurz is said to have contrived an 
astronomical method of making prophecies about the money 
market. * 

' Ehrenbeuo, R., Capital and Finance in the Age of the Renaissance, 
p. 2 JO. 


651 



1552 


^OKl'CAI;,lt^G 


Modern Scientific Forecasts Although forecasting was thus 
once the special prerogatwe of soothsayers, today it has been 
placed upon a broader basis by the development of science 
For one of the objectives of science is, precisely, to forecast 
Science seeks to classify and determine lelationships that may be 
used for purposes of prediction Every scientific law is, in a cer- 
tain sense, a forecast It foretells what will happen under certam 
circumstances The law of gravitation says, for example, that 
if a ball is dropped from a tall building it vvnll fall with an acceler- 
ation of 32 feet per second per second Boyle’s law says that 
the pressure in a given container vanes, and will vary, directly 
with the temperature and indirectly with the volume Scientific 
astronomy makes it possible to forecast the tides, to construct tiie 
calendar for out mundane affairs, and, m addition, to forecast 
celestial events such as the date of tho next visit of Halley’s 
comet There are no “ifs” or “huts” about the modern scientific 
forecasts in the realm of the natuial or physical sciences 
Popular Dramatization of Forecasts The depression of the 
1930’s did more than hundreds of books could have done to make 
people cycle-conscious So general was the interest m ojehcal 
behavior that by 1940 the Foundation for the Study of Cycles, 
was set up as a nonprofit oigamzation with an international 
committee composed of scientists and businessmen This 
foundation proposed to help in the task of integrating the work 
of the thousands of scientists and statisticians who are con- 
tnbuting in various fields to the study of cycles Not only have 
cycles been found to exist in the realm of business activity, but 
scientists in many other fields believe they have discovered 
cyclical behavior m their respective studies For example, 
psychologists have discovered that human beings have regular 
ups and downs in their emotional Ufe, following a cyclical 
pattern Biologists have discovered what appeal to be legular 
fluctuations m ammal, insect, bird, and even fish populations 
In 1937, Prof William Hamilton of Cornell University, upon the 
basis of cycle studies, warned fmmers and housewives of New 
York State to piepare far a scouige of mice m the winter of 
1939-1940 and for another outbreak in 1943-1944 While it 
may still be too early to put the stamp of final scientific approval 
upon all these cyclical discoveries, they are nevertheless making 
important contributions to knowledge 



653 


tjje art or rorecastixg with statistics 

Some of the fcAventieth-centiny discoverie.s .sound almo.st like 
the prieudoscieiitific supei-stitions of the Middle Ages. Thu,s in 
1913 the public \v:i.s advised “to look for skunks under your 
front porch about 19-15.” It \vas claimed that an answer could 
he given to sucli (iue.-ition.s a.s: Will 3-011 feel happy or gloomy a 
month from today? Such .statements were made as; If 3-ou 
are born in January, February, March, or April, the chance.s 
are you will live longer than jjeople born in July, August, or 
September. ‘ These notions about foreca.-ting suggest a precision 
in statistical forecasts that the}' probably will never po.s.se,ss.- 

Condiiional Scientijic Forecaalii. A foreca-^t, to be scientific, 
does not have to be unconditional; in fact, mo.st forcca.st.s in the 
realm of the social science.s and some in the realm of the pIiN'sical 
sciences are hypothetical in character. Indeed, in its largest 
.cense, forecasting must be taken to mean prediction of not only 
what will liappen but what would happen under gi^■en hv-po- 
thetical conditions. Not onh- mu.-'t the predictions of the 
meteorologist and .stockbroker be considered forecasts, but also 
predictions of the engineer as to the outcome of certain plan.s 
and the warnings of the economist as to tlic effect of certain 
propo.sed uction.s of Congrc.ss arc force .-c-t.'. The latter are 
conditional foreca.st.s. 

IMaii}' predictiorus of coming events arc- hedged in by cdl .sorts 
of wea,sel-like conditions. It nny be said that private enterprise 
will disappear if IIe|)ublifan.s are not elected. Or an economi.st 
ma}' predict that a Congressiontd iiu-rc.ase in tariff rates wll 
caasc e.xports to decline, provided that foreign countne.s do not 
offset our liigher tariff In' giving bounties to their ex])orter.s, or 
that foreign demand for American jtroducts doe.s not increase 
for .some unforseen re:i.son, or tlnit American exports do not 
become le.ss co.sll.v to produce. Such forecarsts are conditional, or 
h3’pothotipal, in chariicter. 

The practical wortli of ;i forectist depond.s, not on whether 
it i.s conditional or unconditional, but on how much Imowledgc 
the foreciister actiiall}' has of the relevant conditions. An uncon- 
tlitional forecast mti}' lx; merely' a ^viid gue.ss and have little 

‘ Dkwky, Euw.\iu> lic.-'.SKLL, “Scicncc tlu- Future.” Aumrican 

Vol. l.'tt) (1943), pp. 90-92. 

’See Itlore K.xiict Forccustiiig .‘iiul Less K.xaft forecasting, p]). 059-661. 

also SjU'fU atul I4ir,\'<’.vN', Sainiiling Stotixti^'’’ c/nl Appliralions. 



(554 


ronrcAHTiho 


value, 3n of its uncomproiniamg unil cattgoncal appearance 
nhilo a caicfully drawn conditional forecast may bo of gicat value 
in spile of ita “pusHyfooting 'a8))cct In the case of Uic latter, it 
may bo that tlic likcbiiood of tlio conditional factors is very 
slight and that they are mentioned only to guard tho foiccaatcc 
from unwarrantLd cntitwm On the other hand if tho disturb' 
mg factor has a fair likelihood of oecurrcnce, the nutiiru of its 
t Ifcct might bo foiceast ho that the lecipicnt of the forecast 
eould Ik on hw guaid agamst this factor, by watching it, ho 
might know wlien to abandon liis faith in tho original foiccast 
lor example if i jiredielion of rain tomonow is based merely 
on the fact that it looks sonuwhut cUnuly today, the foreeoKt 
woiih! ])iobub]y be of little value (iii the sense that such foiecasts 
would i>robiil )y be wiong moie ofltii than they wtie nglit) 
In contrast if a tiainid observer predicted nui after u thorough 
observation of the weithtr situation, this would have consider* 
al)|{ value even if he hedged hw prediction by saying tbut the 
ram might not occur if the wind in u neighboring nrei shifted 
befon i cert iin time 

Qualtlalivc vm Quarilttahvc loncasia Most forecasts uii 
(jimlititivo in character llio meteorologist says it will ram 
but eloes neit always siy liow heavily iho economist may 
predict that the effect of m increase in Uic tarilf will be tej i tisei 
prices blithe does not often siy to wliat degree Ihcmclcored 
ogist on the contrary, m ly give the opproxim ito time when ram 
IS expe e te d and how many inches arc expected to fall, and tho 
( ronomist in ly try to e stunule the uverago foicsceu nsc in pnees 
Iho Utter would be ejuaiititativo forecasts 

It will bo noted that forecasts may be eiuantilativo m two 
ways with refertnee to the dcgiee of llio predicted eliangc and 
with leftrenco to tho lime of occurrence Iho HueecKs of foie- 
easting must be judged, not only on the liusis of wliether the 
foiecist was correct, but also on how far the forecast went in 
uetually dtseiibing tlio futuic c\cnt — its quantity and its timing 

IllualriUtoiiH of Modern lorccaaln In tho modern world 
force isls ai( applied m many fields Pieilietions of aslionomical 
(vents, IS uheady indualcd, havo been among the cirhest and 
most successful foiicusis 'llio movements of the moon, tho 
lilanets and other hcivuily Imdics Iiavo been computed with 
(Oiisidciubly accuiaiy m that Ihcir luluic course may bo prt' 



THE ART Oh- EOltEO ACTING WITH ETATIETias G5o 

dieted with great precisiou. Forecasts of certain eclipses, 
for example, have been only a few seconds in error in timino-’ 
In tliis connection, it is interesting to note that the theory of 
least squares' was largely developed in the attempt to forecast 
the paths of the heavenly bodies. 

Closely akin to astronomical forecasts have been forecasts 
of weather conditions. Short-range forecasts are based mainly 
on. wind conditions and baromelric pro.^^.sures, but long-range 
forecasts are sometimes attempted from the study of rainfall 
data, sunspot.s, and the like. In .some instances, .studies of 
growth rings in old trce.s have yielded weather data going back 
many years. These .studies usually look for cyclical fluctuations 
that will indicate periods of high and low activity and permit 
long-range forecasting. Sttidies of average weather conditions 
and the dispersion around the.-.e average.s also afford forecasts 
of the variability of conditions in different area.s and hence 
suggest the more desirable airports and air route.s. 

Engineers make many forecast.-. A water-power engineer 
wall forecast the amount of power to be oljtained from a dam of 
given size built in a given river. Another engineer may predict 
the breaking strength of given kind.s of n-ire at different tempera- 
tures. Still another may predict the inaxiinum load to be 
sustained by a given bridge. 

From the laws of Mendel, biologi.-3ts make prediction.s of 
results to be expected from cro.ssbi'eeding. Agronomists will 
predict the average results to bo obtained from the use of certain 
fertilizers, or certain methods of cultivation, or certain varielie.s 
of crops. Agricultural economists attempt to predict the 
effect of certain sized crops on the future prices of important 
commodities or the effect of certain price.s on the future volume 
of production. 

Business economists attempt many lands of foreca.-ts. Frofn 
studies of factors closely related to the sale of a certain product 
in a given ai'ea \vhere the trade has been well e.sta))lishcd, fore- 
casts may be made of the sales to be obtained from new untapped 
areas of similar character. Other economic foi’ecasts aim to 
predict the ups and downs of the business cycle in various lines 
of activity. Probably the greatest percentage of economic fore- 
casts are devoted to predictions of the stock market money 
rates, bond prices, and security prices. 





i)5i> 

Ihisc ire but a foH of the LYiinpIo of foKcu-lmg It h 
probabI> true that force ititiiig in all lU ramificationH is pniich niic 
m modem life 

Use oj Slalistics tJi Foreca^tng This chapter attempts to 
outline the Use that may be made of stati-^t leal anal) '.ls in mahing 
forecasts) Details of the methods of force istiiig are I>c 3 und the 
scope of this volume which is not a book on foretasting but 
mereh mcliidt's a chapter on the pattern of mt thods ustd in fore- 
casting V few examples will bo given to illustrate thc^e 
methods ’llic um hcic is pnman)^ to indicate the application 
of different statistical techniques to the problems of forecasting 
Statestics affoials a basis for foucasting in two principal 
wa^s (1) Hv studying monovanatc and inultivanati frequeney 
distributions statistics ire used to forecast average re-ults and 
tlie tjpo and degree of disjuruon around these itsuHs (2) lly 
means of time-scries analyns st itistica are U'cd to piedict the 
course of eaenls in time Each of these applications to foro- 
castuig wall be Jit-cuNscd m the tusmng pages 

FORECASTS FROM DISTRIBUTION STUDIES 
Forecasts from Monovanate Distnbutions If consiclcrabto 
data have been obtained, forecasts from monovanate distnbu 
tions ina> yield good estimates of tlic mean standaid deviation, 
coefficient of skewnea*, and kurtosis of the ivopulatvon frr>m 
which the data were derived If such is the case, these popula- 
tion estimates raaj be used to forecast the character of future 
samples diawn fiom the given iiopulation 

Familiar matters relating to family care md health conven 
tionally rely upon forecasting by use of monovanates Suppo^o 
the frequency distnbutioii or monovanate rejircsonts the 
weights of boys of specified age The mean of that distnbutiou 
H presumabb noimal weight for that ago, the standard devii- 
tion and kiirto is dcscnljc expected variability ti-om «ueh i 
monovanate and its statistics it w commonly mfcriod whetlier or 
not a child is undci normal weight and, if so whether or not this 
dcficicncj IS sufficient to cause alann taken with other 
evidence it maj be the basis for the application of timclv thei i- 
peutic action 

In social control, monovanate distributions are luscd to 
stand irdize products involving the presumption of forecasting 



t)57 


Tllh AH I Ob bOliKQAtiTlSG \\’ITI{ STATISTICS 

The fat content and standard deviation in fat content of milk 
that has been produced and sold in the past constitute a set of 
standards according to which it is ruled that milk sold in the 
future Avill conform; thus milk is graded according to standards 
found by frequency-distribution analysis. ^Methods of wei<^ht 
or content are used by the Bureau of Standards to set standard.s 
for many products, both in the raw .state, such as grains of wheat, 
and in the final product, such a.s bread; and abnormal variation-s 
from these standards in tlie market are not permitted. 

In business the use of raonovariate distributions for fore- 
casting is widespread. For e.vample, the distribution in size.-, 
of shoes sold by a retailer is used as the basis for foreca.--:ting his 
future .sales and for determining his reorders of additional shoe-. 
In such forecasting, the Itusinessrnan is interested in foreca-sting 
each cla-ss in the distribution rather than in the distribution’s 
average and .standard deviation. A .similar forecasting pro- 
cedure i.s used by any retailer when he purchaso.s article.s that 
are .sold by size, which include mo.st articles of clotliing. The 
wholesaler and tlie manufacturer aho are interested in the .same 
type of foreca.sting, so that they may profit by having the 
appropriate number of articles of various .sizes continually 
readj' for the consumer — if the article.s are there, ready for him 
to buy u’hen he comes, a minimum of consumers' sale.s trill be 
lost.' 

Forecasts from Bivariate Distributions. Bivariate data may 
likewise tdeld estimates of a bivariate population that may make 
it pos-sibie to forecast re.sults of future samples. .Suppose, for 
e.xaraple, that it is found from a study of army records that there 
is a high correlation between the Army General Cla-ssification 
Test scores and the re.sult.s obtained in a given electrical course. 
To be specific, suppose that the bivariate distribution of these 
two variable.s appears to be normal in form and it is estimated 

^ For another illustration of the use of inonovariates in forecasting, .sec 
Itobert J. Myer-s, “Comparison of Demographic Rates .\ssumed by the 
National Rcisources Committee with Actual Experience,” J ournal oj the 
American Slalislical Association, Vol, 38 (I943.I, pp. 201-209; also, for an 
e.xample of sucli forecasting for the purpose of control of quality of manu- 
factured product, see William B. Rice, “Quality Control Applies to Busi- 
ness Administration,” Journal of the American Statistical Association, 
Vol. 38 (1943), pp. 22S-232; cf. W. A. Shewhart. Economic Control of 
Quality of Manufactured Prodwi (1931). 



THE AItT OF FORECASTING WITH STATISTICS 059 


thea it should have sufficient supplies of wool socks on hand to 
provide for new issues every 100 days on the average.* If the 
study indicated that the first-order standard deviation about the 
plane of regi-ession was about 5 days, the Army might keep on 
hand a large enough supply to replace socks every 85 daj^s (that 
is to say, 100 minus three times the standard deviation, or 
100 - 15 = 85). 

Errors of Forecasts. In concluding this section on the use 
of monovai’iates, bivariates, and multivariates as forecasters, it 
must be noted that forecasts of the kind indicated are necessarily 
ine.xact. They are based on the assumption that the population 
is exactly knonm. When the population characteristics them- 
selves have to be estimated, as they usually do, then the fore- 
casts based upon these Estimates uill suffer from all the errors 
involved in the latter. The more refined analy.sis that is required 
to take care of these errors of estimate of the population is 
beyond the scope of this book. It is .sufficient here to point out 
that the error of forecast based upon estimated population char- 
acteristics is greater than that based upon a known population. 
For example, if a plane of regression based upon sample estimates 
has a related standard deviation of five points, the probability of 
a forecast based on the plane being off by as much as two times 
five points in either direction (therefore, 2<r) will be, not the 
normal 5 per cent, but perhaps 10 per cent or more. Every- 
thing depends on the size of the .sample from which the original 
estimates of the population characteristics were made. 

FORECASTING TRENDS WITH TIME SERIES 

More Exact Forecasting. If much is known about a par- 
ticular time series, so that the nature of its growth and cyclical 
movements can be fairly well determined from rational con- 
siderations and if the remaining fluctuations are apparently 
random, forecasting from such a series can be put on much the 
same level as forecasting from distributions of the monovariate, 
bivariate, and multivariate type discussed above. Careful 
estimates may be made of the growth, and these may be extra- 
polated for a short period of time into the future. Distribution 
analysis of the random fluctuations null determine the range of 

^ For the summer season this would be less than for the winter season, 
but a stock level based on a 100-rlay turnover might be taken as normal. 





fluctuations around tho gronth cur\e and «ill afford an estimate 
for the error of a forecast based soldi on the growth clement in 
the senes 

Suppoae, for example, that a logistic form of growth appears to 
be verj logical for a certain t>pe of data If the data ha\i 
reached a certain stage of development, the values of the next 
few penods may be foiccast from an extrapolation of the logistic 
curve fitted to the past data The amount of error in the foic- 
cast resulting from tho random fluctuations around tho normal 
growth maj be estimated from the standnid deviation of the 
residual fluctuations of the dit i from the fitted logistic curve 
Illustrations of tlie tjj>e of time senes that would permit 
fairlv exact forecasting are afforded bj Fjg 54 in Chap V and 
Figs 144 and 145 in Chap XXI 
Less Exact Forecastmg The real difficultj in most time- 
senes analj^es is to determine what is random and what is not 
random Furthermore, it is often hard to worV, out any rational 
basis for specific forms of the ticnds and cj ties' In cases 
where there is no paiticular trend indicated bj tho rationale of 
the situation, foiecosts must be of a rough-and-ready sort, and 
little can be done to determine the cnor of foiccast 
Economic time senes aic generalli of tlie soit that do not 
permit more exact statistical forecasts * Foi this reason statis- 
tical ana!) sis is usualli otiU one of the elements ontenng mte 
the making of economic forecasts In some cases it pla) s a more 
important role than others, but nearly alwaja the forecaster 
incorporates his statistical findings into a general appiaisal of 
the situation* As indicated above, statistical analjsis in the-?e 
cases IS itself laigcly intelligent guessing The statistical part 
of an economic forecast is consequently merely the quantit iti\ c 
ingredient of the final forecast 

Public utilities, cspeciall> the telephone companies, are 
keenli interested in the subject of foretasting growth or trend 
elements in time senes In the telephone business the lajing 

I See discussion on rational vs empirical trends pp ooO-odo 
* Tintner, Gerhard, “The Analjsis of Economic Time Senes ” Journal 
of the Amertcan Stalislical Associalion,\^ 35 (1940), pp 93-101 Walus 
W Allen, and Gloefrfy H Moore, “A Significance Test for Time Senes 
tnaljsis,” Journal of the Aintncan Stattstical ABioctaiion Vol 3(5 (1941), 
pp 401-409 Vol 38 (1943), pp I53-1&4 



GGl 


THE ART OF FORECASTING WITH STATISTICS 

out of pluiis uiul fcho consfciuction of new exchanges necessitate 
some sort of forecast as to the future growth of the community. 
For yeai-s these companies liavc maintaiued elaborate and 
efficient research organizations who.se bu.sine.'S it is to foreca,st 
trends in growth of population as well a.s the geographical 
distribution of variou-s type.s of business and re.sidential areas 
in the commuuitie.s served. 

iVIost business enter]n-iscs, liowever, are more concerned 
about cyclical fluctuations than about trend or growth in time 
.series. For thi.s reason the greatest number of published fore- 
casts have to do with the prediction of cyclical movement.s in 
busines.s conditions. 

FORECASTING CYCLES WITH TIiME SERIES 

All that has been .said about the inexact ncs.s of forecasting 
trends by the use of time scries applies ecpially to tin- forecasting 
of cycles with time .serie.s. Xeverthelcs.'-, the practice of relying 
upon statistie.s a.s an aid to bu.sine.'-.s is now so prevalent that 
statistics, along with accounting, has become ore of flie standard 
tools and one of the c.sscntial means of internal control of nearly 
all economic enterprise.s, a.s well a.s a guide to public policies of 
governmental agencies. Among its many .''Ommercial u^es, 
busine-ss forecasting is one of the rao.->t important, and it i.s along 
this line that statistical analysis has been Inlciivively developed 
in recent years. Today there are sevc-ral important agonrie.' 
that supply forecasting .services. Among the.se are Standard k 
Poor’s Corporation, Brookinire Economic Service, Moody’.s 
Investor's Service, Bairson, and the Harvard Pleonoinic Society. 
In addition, many commercial banks such a.s National City, 
Cleveland Trust, and Chase National include forecast.s of prob- 
able future busine-ss trends in their monthly letters. Supple- 
menting these professional services are the statisticians and 
statistical departments of many large corporations, such as the 
American Telephone and Telegraph Company, which make fore- 
casts for their own use. 

American activity in this field has been internationally con- 
tagious. As early as 1921 the publication of the Economic 
Bulletin of the Conjuncture Institute was begun ii\ Moscow; this 
publication was devoted to the study of business cycles and 
to the analysis of Ilussian stati.stics. Subseciuently in nearly 



GC2 


tOHhCAbTINa 


every important European country dunng the 1920’s and 
1930*8 forecasting services were orgamzed, sometimes by the 
large universities The League of Nations showed its intention - 
of inaugurating forecasting on a world scale by appointing a 
Committee of Experts on Economic Barometers ' 

The many possible occasions when forecasting is required m 
modem business can be shown by a few examples A com- 
mercial banker granting a loan must forecast the probability 
of its being repaid, hia judgment in this respect will depend on 
his forecast of the borrower’s future earning power, this, in turn, 
depends on hia estimation of probable future price stability 
m the borrower’s business Similarlj, a collateral loan will 
involve a prediction, more or less precise, of the future value 
of the security offered as collateral A manufacturer needs to 
forecast probable sales and probable pnees of his own goods 
and of materials he has to puichosc, so that he can profitably 
plan production and plant expansion A public utility operator 
needs to foiecast population and indu&tnal tiends, construction 
and operating costs, and probable pnccs for the service, in 
order to determine when and where to build a railroad line, a gas 
mam, a power plant, or a tclejihone exchange 

All these things are commonplace m economic life, but the 
growing complexity and interdependence of economic society 
have made it increasingly difficult for the average businessman 
to comprehend an existing situation m tiynng to formulate his 
programs for the future He is not a statistical expert His 
knowledge of methods of sumraanzation and comparison goes 
usually little beyond a vague comprehension of averages To 
aid him, it is the purpose of the vanous business forecasting 
agencies “to provide the basis for business, financial, and security 
market policy Regardless of the inevitable margin of error 
m every forecast, business, financial, or security market policy 
which IS geared to only a fairly intelligent estimate of future 
probabilities is moie hkely to be sound than is policy geared 
only to guess, or to no forecast whatsoever ”* 

‘ Cox, G V , An Appraisal of ylntencan Business Forecasts, pp 1-2 

•“A Forecaster’s View of Forecasting,” Standard Slatuhcs, (June 15, 
1931), p 14 Also see Bratt, Elmeb C, Business Cycles and Forecasting 
(1941), pp 736-800 IIabdv, Cuables O, and GAnriELO V Cox, Fore- 
casting Business Conditions (1927) 



THE ART OF FORECASTING WITH STATISTICS 663 


Forecasting General Business Conditions. One of the most 
important objects of economic forecasting is to predict general 
business conditions, that is to say, the cyclical position of general 
business. Good times and bad times are such important 
elements in determining the prosperity of individual lines of 
activity and of individual firms that the prospect for general 
business is probably the first thing any corporation executive 
TOshes to know. Statistically, general business is properlj'^ 
measured by some index of all business activity. It is the sum- 
mation of the whole and not merely one of the parts, although 
an index of a part, say an index of industrial production, may be 
taken as a barometer of the upswings and downswings of the 
whole.* Such series are commonlj’- called “business barometers.”^ 

Forecasts of general business conditions are based upon one 
of two forecasting methods or a combination of the two. The 
first method is known as “historical analogy,” the second as 
“crosscut analysis.”- 

The method of historical analogy is based on the assumption 
that in cyclical fluctuations history tends to repeat itself. In 
its cruder forms, this consists merelj’’ in forecasting the course of 
general business, subsequent to some disturbance, from the 
course of general business that followed a similar disturbance 
in the past. For example, the forecaster might undertake to 
predict the course of general bu-siness following the crisis of 1929 
from the course of business following the crisis of 1873. In 
moi-e carefully thought-out form, historical analogy becomes a 
business-cycle theory that attempts to explain how the interplay 
of economic forces causes general business now to rise and now 
to fall. 

Crosscut analysis proceeds on the basis that the business 
situation is never the same and that each new upswing or.down- 
s\ving is the product of a set of factors different ■ from those 
previously operative. To understand the given situation all 
the factors must be weighed as to their importance and a net 
appraisal of the situation derived. 

> See pp. 530-535. 

2 For more elaborate classifications see Bratt, op. cit., pp. 736-760; 
IHasey, L. H., Business Forecasting (1931), p. 195; Day, E. B,, “The Role of 
Statistics in -Business Forecasting,” Journal of th'e American Statistical 
Association, Vol. 33 (1938), p. 2. 



004 


l■OHbCA!ill^a 


In good forecasting, both methods are employed If a certain 
cyclical theory appears to constitute a good explanation of pj-st 
events, it is good forecasting piactice to consider it in predicting 
future cycle changes Xevcrtheless, careful study must be 
made to see ivhether the lole played m the past by a particular 
industry or economic development la subsequently being played 
by some other industry or development Ihe user of historical 
analogy must always, therefore, be on guard for changes lequircd 
m the statistical embodiment of the cyclical theory on which his 
analysis is based in order to keep it up to date m its assumptions 
Dunng the railioad era, statistics about railroads dominated 
the scene as good indicators of general business conditions, 
later, it was statistics about automobile production, perhaps 
the time wall come when it will be airciaft production Again, 
the present era is often xegaided as the “iron age ” Statistics 
of won and steel production arc often used as baiometers of 
general business conditions because so many of the pioducta 
of the modem age depend upon iron Perhaps the time will 
come when the emphasis will shift, fiom the ])omt of view of 
statistics, away from non and steel pioduction to the production 
of the lighter metals sucii os ihiminum Who can say when the 
world of business is changing fiom the one to the other? 

Peflection along the lines indicated m the piccedmg paiagraph 
leads to the conclusion that continuous ciosscut analysis is 
needed as a means of verifying and justifying the use of the 
historical-analogy method 

Forecasting hy Historical Analogy One type of forecasting by 
histoncal analogy makes extensive use of the fluctuations m 
paiticular time senes that appeal to lead general business con- 
ditions Esamplts of senes that have been used as business 
barometers are indexes of stock market prices, changes in unfilled 
oideis of the United States Steel Corporation, machine-tool 
ordeis, and the loan-deposit ratio The.se senes, it is argued, will 
tend to lead changes in general business conditions, and important 
changes in general business conditions anil first be made apparent 
by them Foi example, a cleai and consistent downswing m 
unfilled bted oideis is picsumed to pitiiagc a similar downswing 
in general business Hence the lattci is presumably forecasted 
fiom tlie former In the case of the loan-deposit ratio, it is the 
ittainment of certain cntical levels that is significant when high 



-Price of all lisle’d sloclis 

-Bank deblls, 241 cities outside N.Y.C. 





666 


fOKfcCASIfNO 


levels are reached, for example, (t e , when loans are high relative 
to deposits) strained credit conditi^ps are in evidence and a 
cnsis will be forecasted 

More elaborate analyses making use of the historical analogy 
for forecasting combine several economic senes A well known 
example of such a combination is that prepared bj the Harvard 
Economic Society and published in the Heinew of Economic 
Statistics IVhile the society itself makes no forecasts from its 
statistical senes, they have been found useful for such pui poses 
and it IS genei allj understood that that is what they are published 
for These are shown m Fig 152 

The Harvard senes consist of three curves, known as the 
A, B, and C curves The A curve represents speculation, the 
B curve business, and the C curve money The actual data 
upon winch these curves have been based vary from time to 
time In those shown in Fig 152, the curves are constituted as 
follows ^ The A curve, speculation is based on the pnee of all 
secunties listed on the New York Stock Exchange The B 
curve business, is based on bank debits in 241 cities outside 
of New York Citj The C curve, money is based on short-term 
money rates In each of the constituent senes the trend and 
seasonal variation were removed (when it was deemed appropn 
ate) before the hnal indexes were computed 

The theory that underlies the use of the Harvard curves 
for forecastmg is that changes in speculation wall generally 
precede changes m general business and that the signiHcance 
of these changes will be more clearly understood when the 
course of the moncj curve is noted A sharp nse in speculation 
atatimewhenmonej rates aic low and still falling would appear 
to forecast better business conditions On the other hand, a 
fall in speculation at a time when money rates are nsing would 
appear to forecast a decline in general business If coupled with 
a detailed crosscut analysis of the current business situation 
these curves are found v ery helpful in predicting general business 
conditions 

The Harvaid curves arc but one set of curves that have been 
empIo>ed in tius attempt to forecast general business conditions 
Various combinations of curves have been used A number 

'Fbicket Edwin ‘ Ilevuion ol the Index of General Business Condi 
tions Renew of Eronomte Slatistiea Vol 14 (1932) pp 80-87 



THE ART OF FORECASTING WITH STATISTICS 667 


make use of capital issue by private cox-porations and capital 
expenditures of the various government bodies. The idea 
behind the use of investment curves is that the volume of income, 
and hence business, is largely determined by the volume of 
investment. 

As the result of a great amount of research work during the 
past twenty or twenty-five years, mo.stly under the auspices 
of the National Bureau of Economic Research or the National 
Industrial Conference Board, but also by scholars in the United 
States Department of Commerce, increasing attention has been 
given to methods of measuring business conditions based upon 
quantity and distribution of national income. Instead of indexes 
of production, employment, volume of trade, and the like, these 
new indexes attempt to measure national income and its distribu- 
tion, consumer expenditures and producer expenditures, saxdngs, 
capital formation, and the like. Figure 153 gives a picture of 
annual consumer spending, 1919-1942, showing indexes con- 
structed by Kuznets (National Bureau of Economic Research) 
and by the United States Department of Commerce.^ Figure 
154 is another illustration of the use of national-income statistics 
and their derivatives to show the cycle in general business 
conditions. This figure reproduces an index of that part of the 
national income devoted to expenditures for new durable goods 
and indexes of gro.ss capital formation, net capital formation, and 
offsets to savings. The United States Department of Commerce 
index of private gross capital expenditures is presumably equiv- 
alent to Kuznet’s gross capital formation; to these are added 
indexes by Laughlin Currie reputed to measure income-producing 
Federal expenditures that offset savings and net government 
contribution to savings. The index of expenditures for new 
durable goods is constructed by the Board of Governors of the 
Federal Reserve System. 

Time and expex'ience will reveal whether or not the national- 
income type of inde.xes proves to be better than the barometer 
or over-all measure of business actixdty types. The national- 
income type has been made possible by the increasing amount of 

‘Hoffexbehg, and Mabel S. Lewis, "Estimates of Xatioiial 

Output, Distributed Income, Consumer Spending, Saving, and Capital 
Formation," Review of Economic SlaCisiics, Vol. 25 (May, 1943), pp. 107- 
174, 124. 



008 


lORhCASnW, 



1919-1942 J/’rom The Retxea of Economic SlalutKS I of 2o {May 1943) 








070 


FQRECAStING 


statistical data on income resulting from the administration of 
the Federal personal and corporate income taxes 
Perhaps the greatest difficulty with all forecasting senes is 
that the amount of the lead or lag is likely to \ ary considerably 
from time to time so tint the tinung of the forecasted change 
becomes difficult ' Another difficulty is to judge how great a 
change m the forecasting senes must be before it is consideied 
significant The curve is almost bound to show minor ups and 
downs that are little related to general business Presumably a 
movement cither up or down must be great and persistent before 
any significant change is forecasted, but how gieat and how 
persistent is the question The ansuer to this question is aluays 
easy to read ex post facto but in following the forecaster from 
month to month this is more difficult If a lead is short and 
data are not reported quickly, a given forecasting senes, con- 
sistent and reliable as it may be is unlikely to have much fore 
casting value since the change would be under way before it 
was manifested by the forecasting data 
These difficulties apply m dilTcring degiees to the various 
kinds of forecasters In the cose of the barometer type, which 
13 ordinarily dependent upon one pr^umably indicating senes 
such as unfilled orders of the United Slates Steel Corpoiation, 
the data arc usually promptly available, but the minor ups and 
downs and the varying degree of lead and lag m the baiometei as 
compared ivith general business constitute over present diffi- 
culties m their use The indexes of general business activity 
based upon combinations ot several senes are less affected by 
difficulties ivith respect to lead and lag and minor fluctuations, 
but it IS often difficult to find a combination of senes that arc 
promptly reported The national mcome type of indexes suffer 
particularly from the fact that the data are not available cur- 
rently, except for estimates that are being attempted, and these 
are dependent upon other types of data 

A unique type of forecasting by histoncaf analogy is empioyed 
by Roger Babson The forecasting instniment is the Babson 
index of the physical volume of production This covers manu- 
factunng production, mineral production, agncultural market 

> But see p 674 for application of leads and lags to tho forecasting of 
specific lines of business activity in which it can be more successfully 
appbed 



THE ART OF FOREC^LSTING WITH STATISTICS 671 

ings, building and construction, railway freight, electric power, 
and foreign trade. The long-run trend of this curve is taken 
as normal, and the cyclical fluctuations are forecasted on the 
mechanical principle that a given action has an equal and 
opposite reaction.^ Thus the area of a given period of prosperity 
will indicate the area, but not the shape, of the coming depression. 
The slope of the depression area is forecasted to some extent 
with the help of other series and contemporary crosscut analysis 
of individual lines of activity. 

F orecasUng by Crosscut Anabjsis. Even if considerable 
reliance is placed upon certain foi’ecasting series based upon the 
historical-analog}' principle, it would seem desirable to supple- 
ment the analysis by a moi-e detailed study of the current 
situation. This will help to time the forecast better. It mil 
also assure the forecaster that the forecasting series continue to 
hold their theoretical significance in the ebb and flow of business. 
The great danger is that the business-cycle theory on whiqh the 
forecasting series are based may become outmoded or may be too 
simple to be fully satisfactory as new conditions unfold. Cross- 
cut analysis may possibly reveal these defects and help to 
remed}’’ them. 

Some believe that business cycles are unique and that the 
roles played by various economic developments shift from cycle 
to cycle. If this were true, crosscut analysis would be the only 
logical method of forecasting. Some general theory would neces- 
sarily have to underlie the forecast, even if it were the negative 
theory that all cycles are unique. Nevertheless, it is necessary 
to examine all the important sectoi-s of the economy, to weigh 
their relative importance in the given situation, and to determine 
the net outcome. This requires comprehensive surveys and 
shrewd judgment based on unde experience. 

Such agencies as the Brookmire Economic Service, Standard & 
Poor’s Corporation, and Moody’s Investor’s Service generally 
follow the crosscut method. The Brookmire Economic Service 
watches carefully selected series, such as building consti'uction, 
motorcar output and registration, exchange rates, and industrial 
employment. The importance attributed to the various series 
differs from time to time. Also, new ones are added and old ones 
discarded as the economic situation changes. In all cases where 
Seasonal variations in indmclual serie.s are eliminated. 



072 


}•0RhCA8TI^G 


warranted, the Brookmire Economic Service distinguishes care- 
fully between basic trends, seasonal variation, and business 
cjclcs in its appraisal of the busmess outlook The Standard 
Poor’s Corporation also Hatches many lines of activity and 
forecasts de\elopment in each line The forecast of general 
business is mainlj a summary of these manj individual forecasts 
Moody’s Investor’s Service likewise bases its general forecast 
upon many individual analyses In making its forecasts, how- 
ever Moody’s appears to be especiaU> influenced by business- 
men’s anticipations of profits, a factor that receives much empha- 
sis m modem busmess-cjcle theory 

Forecastmg Particular Lines of Activity The same methods 
are used to forecast particular lines of activity as for general 
business conditions Crude historical analogy, the use of 
leading scries, and crosscut analysts all play their roles 

Crude Htstoncal Analogy Figure 155 is an excellent example 
of th^ use of crude historical analogy m forecasting the course of 
agricultural prices and of wage rates during a long and extensive 
world war Here the course of agncultural prices and wages in 
the First World War is taken as the pattern for the expected 
course of agricultural prices and wages in the Second World Wai 
From the proximity of the two senes to each other until the 
beginning of 1043 it would seem that the forecasting power of the 
former senes is relatively high This method is of greater value 
m forecasting particular lines of actmty than it is when applied 
to general business conditions, although crosscut analj bis might 
modify judgment of this forecast by pointing out that the efforts 
at pnee control and inflation control m the Second World W ai 
appear to be moie courageous than they were in the First 
World War 

Lead Lag Relationships Figure 156 illustrates the lead-lag 
relationship in forecasting hog production In raising hogs the 
principal cost is the com on which the hogs are fed Further 
more, the ratio between the amount of com fed to a hog and his 
weight IS fairly constant Hence, the profitability of hog 
raising is essentially indicated by the so called “corn-hog dif 
ferential,” which is the difference between the price of 100 pounds 
of hogs and the cost of enough corn to raise 100 pounds of hogs 
As this differential increases, hog production becomes moie 
piofitable, as it decreases, hog pioduction becomes less profitable 



THE ART OF FORECASTING WITH STATISTICS 073 


The increase or decrease in profitability affects hog production 
Tivith several months lag. Hence, changes in the corn-hog dif- 
ferential can be used to foreca.st changes in hog production, a's 
shown in the figure. 



r.oK ausoHAf. vahutiom 

Fig. 155. — Prices received by farmers and composite wage rates. Indo-v num- 
bers, United States, 191-1-1920, and 19-39-1943. [From The Agricultural 
Situation, {May, 1943), p. 8, yubliahed by the Bureau of Agricultural Economics, 
United , Stales Department of Agriculture.] 

The Cycle Htjpolhesis. The lag of hog production behind the 
corn-hog differential not only permits forecasting of the former 
but also tends to cause periodic upswings and downswings in the 
two series. The reason for this is as follows: Suppose that the 




074 


hOHhCAi>lI\G 


demand for pork increases and, owing to the inaoinvy w increase 
rapidly the production of hogs, the corn hog diffeicntial rises 
This makes hogs more profitable to produce, and their number is 
gradually increased Iho lag m response, howeicr, may cause 
the differential to go higher than it would otherwise, and this m 
turn might stimulate a greater increase m production than is 
required to meet the new demand that caused the original rise 
m the ratio When this enlarged production comes on the 
market, pnees fall and the corn hog differential drops Oi\ingto 



Fio ISO —Hog corn pneo ratio and ho(; markelings 1901 1942 {FTornBvTeau 
of Airneultftral Eeonam\c» VniltA Stniet Diparlment of Agnculturt ) 


the greatly increased supply, pnees go lower than their natural 
level and hog production becomes less profitable for a while 
The change m profitability causes hog production to drop off, 
and ultimately pnees tend to nse again, completing the cycle 
This existence of a periodic movement in the com-hog dif- 
ferential and in hog production penults forecasting for some 
distance into the future If a great war does not interrupt the 
normal course of economic forces, the higli and low penods in 
the corn hog diffeiential can be predicted with a fair degree of 
accuracy Wise hog farniere gam constdciably from this long- 
range forecasting Similar periodic movements tend to appeal 
m othei lines of agriculture m which production lags behind 
price stimuli For examine, the cattle cycle runs about fifteen 


THE ART OF FORECASTING WITH STATISTICS 675 

yeai-s, according to studies made by the United States Depart- 
ment of Agriculture. 

Crosscut Analysis. The application of crosscut analysis to 
particular lines of activity' is based in many instances on the 
analysis of supply and demand conditions. In agriculture, the 
carry-over and current crop prospects are important factors 
on the supply side. The economic condition of industries or of 
sections of the population using the given product, the prices of 
competing products, and the output of competing areas are 
important factors on the demand side. If the product has 
widespread uses, possibly prediction of changes in consumer 
incomes or in general industrial activity might be the best way of 
forecasting the future demand for it. 

In manufacturing, principal attention is likely to be devoted 
to demand. When the demand is industrial, the forecasting 
takes primarily the form of predicting conditions in those lines 
of activity immediately supplied by the given line of manufac- 
turing. Thus steel production might be forecasted from railroad 
construction and maintenance, automobile production, road 
construction, and building activity. When the product is one 
sold to the consuming public and not to other industries, the 
analysis of demand becomes largely a study of the flow of income 
to consuming areas. This will be dependent on the prosperity 
of important industries in these areas and on the net flow of 
incomes from outside sources. The prices of competing products 
will also be an important demand factor. 

A statistical technique using multiple and partial correlation, 
mathematical and graphic methods, has been developed for 
making crosscut analyses such as those suggested in the two pre- 
ceding paragraphs. This technique is widely used; in the case 
of many products the multiple- and partial-correlation technique 
makes it possible to dei’ive demand curves that wll forecast with 
considerable accuz’acy the amount of change in sales that would 
accompany a given contemplated change in price.* 

FORECASTS WITH SEASONAL VARIATION 

Forecasting with seasonal variation is probably the oldest of 
all types of modei'n forecasting and is so general as to be common- 


1 Cf. Scuur-TZ, HekkY, The Theory and Measuiement of Demand (1938). 



676 


fQPECASH\G 


place It IS applied to paiticular Imes of activity more speci- 
6call> than to general business conditions 

Historical Analogy Use of historical analogy for forecasting 
inth seasonal vanation is simpler and more dependable than the 
use of histoncal analog} for cyclical or trend forecasts The 
conditions underlying persistent seasonal variations are more 
readil} analyzed than are the rational explanations of c> cles and 


0£ATM SflUS PtP 1000 ANNUAL BASIS {l343 f ^ iff ffovn «w/} 



Fic 137 — Mortality from all causes MetrotxiUtaR Life losurance Compan> 
ludustnal department weekly premium pa} ina business [From tht Staiulual 
Bulletin VcH 24 (July 1913) p 12 puUithcdbj/ Uu MUrfipolilanLi/e Inturatue 
Company ] 

trends Moreover, the forecasting is for a shorter penod into 
the future and can therefore depend upon conditions remaining 
approMmately unchanged pending tlie outcome of the forecasted 
events Statistical techniques have been developed for measur 
mg the dependability of a given seasonal vanation ’ 

Figure 157 illustrates the extrapolation of seasonal v anation, 
which IS the use of histoncal analogy for making a forecast with 
seasonal vanation From the figure it can be foiecast, by 


• See pp 631 636 



THE AItT OF FOUECAETJXG WITH ETATISTICE G77 


assuming a continued agreement between 1942 and coming 1943 
seasonal movement in mortalitj' from all causes, that the Sep- 
tember death rate per 1,000, annual basis, will be about 7.25, 
the October and November rate about 8.25, and the December 
rate about 8.30. • 

Figure 158 is an application of the use of forecasting seasonal 
variation by histotical analogy to the field of agricultural eco- 
nomics. Extrapolation of the 1943 curve predicts that income 
from farm marketings in the South Central region of the United 
States will fluctuate around 200 million dollar-s monthly until 



1*10. 15S. — Cabli income from furm m.nrketingb 1942-1913 compared with 
1937-1941 average. Ifrom The AuncxtUural Situation, TV. 27 (June, 1943), p. 8, 
publUhed by the. Bureau of Agricultural Economics, United States Department of 
Agriculture.] 


July or August; thereafter, monthly cash income from farm 
marketings in that region will rise sharply to a peak in Octobei 
of perhaps 500 million dollars or higher, since the 1943 level 
appears to be a higher average than that of 1942. This figure 
shows the annual average seasonal movement, 1937-1941, which 
gives a somewhat more dependable seasonal indicator than a 
single year’s figures. 

Combining Seasonal wilh Cyclical Forecasting. MTienever it 
is desired to make forecasts on the basis of a period shorter than 
a year, it is necessary to apply a seasonal forecast along mth 
cyclical forecasting. In the case of conventional forecasting 
by the use of business-cycle studies and the resulting barometers, 
general business indexes, and crosscut analysis, discussed m a 
preceding section of this chapter, short-period forecasts based 




G78 


hORhCAi>TI\G 


upon kno\\n feeasou^l \anations are used as «ell as the cyclical 
forecasts 

Many illustrations could be found of the application of this 
combination of seasonal with cyclical forecasting Figure 159 
IS an illustration in the field of agncultural economics Based 
upon statistical forecasting of the cycles in production of live- 
stock, similar to the cycle analysis of hog {production already 
outlined the le\els of livestock marketings for 1943 and 1944 



JUNE JULY AUG. SEPT OCT NOV DEC. 


Tio 159 — Transportfttion losda for livestock estimated on basis of indicated 
marketings and shipments from public markets United States January 1941 
March 1944 [From The AgrtevUuTol SttuaUon Vel 27 {Februarv 1943) p 8 
pubhthed by the Bureau of Agncultural Ecorumxce United Stalte Department 
of Agriculture J 


aie forecast The annual amount is then distributed throughout 
the months of the year according to the piedetermined index of 
seasonal vanation The figure presents the resulting forecast 
of monthly transportation loads for livestock, estimated from 
indicated marketings and shipments from pubhc markets m the 
United States On the same figure are shown the actual amounts 
monthly for the years 1941 and 1942, for purposes of companson 
Figures similar to this one for various lines of industrial and 
manufacturing activity apjxsar frequently in sucli publications 
as the Survey of Current Business and in the publications of the 
V anous forecasting agencies 


'1 Ilh AliT 01' b'UItliC.-VSTIA’G WITH HTA'TIHTICS 


679 


THE QUALITY OF FORECASTING 

The success of forecasting is hard to judge. First it is to be 
noted^ that if an agency declines to make forecasts in difficult 
situations and makes rather limited foreea.st.s in general it is 
likely to have less failures than one that boldly undertaLs to 
forecast on all occasions and in considerable detail. The success 
of a forecasting agency should be judged according to what it 
attempts to do. 

The success of forecasting should also be judged in the light 
of what might be accomplished by mere random guessing. In 
other words, an agency should be right at least 50 per cent of the 
time, or it is wome than uselci^s. .Judged on these bases, the 
various economic forecasting agencies have been fairly successful 
in forecasting general business conditions. Although not 
registering anything near a perfect score, they have at least 
been better than chance. 

One of the chief problems of economic forecasting lies in 
the effect of the forecast itself. The effect of the forecast may 
conceivably be such, on the one hand, that the forecast actually 
causes the forecasted event to occur, oi-, on the other liand, that 
the forecast actuallj' prevents the forecasted event from occur- 
ring. Whether or not sucli untoward results are produced 
depends largely on how widely the forecast circulates. On the 
one hand, suppose a forecasting agency predicts a general infla- 
tion of prices and enough people become convinced that the 
forecast is a true one; in such a case, the forecast may not only be- 
come true but be itself the cause of the thing that is forecasted. 
On the other hand, a subscriber to a forecast e.\pects to profit from 
its use, in that his plans will anticipate probable future condition.s 
of which a competitor is .supposedly not so well informed. The 
fewer who have this information, the more likely it is that they 
will profit and that the forecast will be a true one. But the 
\vider the acceptance of the forecast, the less chance the indi- 
vidual subscriber has to gain and the less likelj'' is it that the 
forecast will prove to be true. Suppose, for example, that a 
forecasting agency advises its clients in a given productive 
activity that the price of its product is going to rise as a result of 
some foreseen increase in demand; if too many of the producers 
obtain the forecaster’s service and follow its advice, overproduc- 



tOUhC {'>tI\G 


()8Q 

tion will result ind the piicc will decline rather than rise This is 
an illustiation of how a foiecast might defeat itself 

In the final analysis, it n)a 3 be said tliat the greatest \alue of 
modem forecasting work lies in the large amount of statistical 
economic analysis that it piomotet. Research into the business 
cycle and continued improvements in the statistical approach to 
social and economic pioblems cannot fail to reveal cIos,er and 
closer approximations to the tiuth and thcieby improve general 
knowledge vboul economic and social conditions 



APPENDIX 

Table I. Fouk-pIuVce Common Logaihthms of Ntuibebs' 



























082 KLEMbXiAUY UTAlIbTICS AND APPLICATIONS 






APPENDIX 


683 


Table I. Four-place Common Logarithms of Numbers. 

{Continued) 



0 

n 


3 

4 

5 

0 

7 

8 

— 

9 

10 

1.00 

0.0000 

0004 

0009 

0013 

0017 

0022 

0026 

0030 

0035 



1.01 

0043 

0048 

0052 

0058 

0060 

0005 

0069 


0077 

0082-1 


1.02 

0086 

0090 

0095 

0099 

0103 

0107 

0111 

0116 

0120 

0124 


1.03 

0128 

0133 

0137 

0141 

0145 

0149 

0154 


0102 

01G6 


1.04 

0170 

0175 

0179 

0183 

0187 

0191 

0195 

0199 

0204 

0208 

0212 

1.05 

0212 

0216 

0220 

0224 

0228 

0233 

0237 

0241 

0245 

0249 

02f;.3 

i.oe 

0253 

0257 

0261 

0265 

0269 

0273 

0278 

0282 

0286 

0290 


1.07 

0294 

0298 

0302 

0306 

0310 

0314 

0318 

0322 

0326 

0330 

0334 

1.08 

0334 

0338 

0342 

0346 

0350 

0354 

0358 


0366 

0370 

0374 

1.09 

0374 

0378 

0382 

0386 

0390 

0394 

0398 


0406 

0410 

0414 

1.10 

0.0414 

0418 

0422 

0426 

0430 

0434 

0438 

0441 

0445 

0449 

0463 

1.11 

0453 

0457 

0461 

0465 

0469 

0473 

0477 

0481 

0484 

0488 

0492 

1.12 

0492 

0490 

0500 

0504 

0508 

0512 

0515 

0519 

0523 

0527 

0531 

1.13 

0531 

0535 

0538 

0542 

0546 

■(mil 

0554 

0558 

0501 

0565 

0569 

1.14 

0569 

0573 

0577 

0580 

0584 

0688 

0592 

0596 

0599 

0003 

0007 

1.15 

0607 

0611 

0615 

0618 

0622 

0626 

0630 

0633 

0037 

0641 

0645 

1.16 

0645 

0048 

0652 

0656 

0660 

0063 

0607 


0074 

0678 

0682 

1.17 

0682 

0686 

0689 

0693 

0697 

■iHgil 

0704 


0711 

0715 

0719 

1.18 

0719 

0722 

0726 

0730 

0734 

0737 

0741 

0745 

0748 

0752 

0755 

1.19 

0755 

0769 

0763 

0766 

0770 

0774 

0777 

0781 

0785 

0788 

0792 

1.20 

0.0792 

0795 

0799 

0803 

0806 

0810 

0813 

0817 

0821 

0324 

0828 

1.21 

0828 

0831 

0835 

0839 

0842 

0846 

0849 

0853 

0855 

0800 

0864 

1.22 

0864 

0807 

0871 

0874 

0878 

0881 

0885 

0888 

0892 

0896 

0899 

1.23 

0899 

0903 

0906 

0910 

0913 

0917 

■ibHiI 

0024 

0027 

0931 

0934 

1.24 

0934 

0938 

0941 

0945 

0948 

0952 

0955 

0959 

0902 

0360 

0909 

1.25 

0969 

0973 

0976 

0980 

0983 

0986 


R ! 1 S 

0997 

1000 

1004 

1.26 

1004 

1007 

1011 

1014 

1017 

1021 

1024 

B m3 

1031 

1035 

1038 

1.27 

1038 

1041 

1045 

1048 

1052 

1055 

1059 

B Mil 

1065 

1069 

1072 

1.28 

1072 

1075 

1079 

1082 

1086 

1089 

■ BBKl 

B m3 

1099 

1103 

1106 

1.29 

1106 

1109 

1113 

1116 

1119 

1123 

1126 

1129 

1133 

1136 

1139 

1.30 

0.1139 

1143 

1146 

1149 

1153 

1156 

1159 


1166 

1169 

1173 

1.31 

1173 

1176 

1179 

1183 

1186 

1189 

aiiiiii 

1196 

1199 

1202 

1206 

1.32 

1206 

1209 

1212 

1216 

1219 

1222 

1225 

1229 

1232 

1235 

1239 

1.33 

1239 

1242 

1245 

1248 

1252 

1256 

1258 

1261 

1265 

1268 

1271 

1.34 

1271 

1274 

1278 

1281 

1284 

1287 

1290 

1294 

1297 

1300 

1303 

1.35 

1303 

1307 

1310 

1313 

1316 

1319 


1326 

1329 

1332 

1335 

1.36 

1335 

1339 

1342 

1345 

1348 

1351 

1355 

1358 

1361 

1364 

1367 

1.37 

1367 

1370 

1374 

1377 

1380 

1383 

1386 

1389 

1392 

1396 

1399 

1.38 

1399 

1402 

1405 

1408 

1411 

1414 

1418 

1421 

1424 

1427 

1430 

1.39 

1430 

1433 

1436 

1440 

1443 

1446 

1449 

1452 

1455 

1458 

1461 

1.40 

0.1461 

1464 

1467 

1471 

1474 

1477 

1480 

1483 

1488 

1489 

1492 

1.41 

1492 

1495 

1498 

1501 

1504 

1508 

1511 

1514 

1517 

1520 

1523 

1.42 

1523 

1526 

1529 

1532 

1635 

1638 

1541 

1544 

1547 

1550 

1553 

1.43 

1563 

1566 

1559 

1562 

1565 

1509 

1572 

1675 

1578 

1581 

1584 

1.44 

1684 

1687 

1590 

1593 

1596 

1599 


1605 

1608 

1611 

1614 

1.45 

1614 

1617 

1620 

1623 

1626 

1629 

1032 

1635 

1038 

1641 

1644 

1.46 

1044 

1647 

1649 

1652 

1655 

1058 

1661 

1664 

1667 

1670 

1673 

3.47 

1673 

1676 

1679 

1682 

1685 

1688 

1691 

1694 

1697 

1700 

1703 

1.43 

1703 

1706 

1708 

1711 

1714 

1717 

1720 

1723 

1726 

1729 

1732 

1.49 

1732 

1736 

1738 

1741 

1744 

1746 

1749 

1762 

1755 

1758 

1761 















Al'l'JiNlJlX 


fm 


'l‘Ai!r,i; VI. — u.n-i>j;)i tiii; Xoinr\i, f’l kvk 
I' idij.'il p.'irl.s cif (In: folal (l.(UH}^ Mii*U*r (iit* itoriiial tairvt^ iM’lwri'u 
the uiejiii and :i jJiTja'iidiciilar urectial :iL vnnDua iniiij)>iT.'i ol standard 
deviations (x/a) from the meand To illustrate the use of the table, 39.065 
per cent of the total area under the curve will lie between the mean and a 
perpendicular erected at a distance of 1.23<r from the mean. 

Each figure in the bodj- of the table is preceded by a decimal point. 


x/<r 

.00 

.01 

.02 

.03 

.04 

.05 

.06 

.07 

.03 

.09 

0.0 




01197 

015D5 

01994 

02392 

02700 

031SS 

035S6 


03983 

04380 


05172 

05507 

05962 

06356 

06749 

07X42 

07535 

0.2 

07926 

08317 

08706 

09095 

09483 

09871 


10642 

11026 

11409 

0.3 

11791 

12172 

12552 

12930 

13307 

13683 


14431 

Betaisl 

loJ73 

0.4 

15554 

15910 

16276 

16640 

17003 

17364 

17724 

18082 

1S439 

18793 

0.5 

19146 

19497 

19847 

20194 

20450 

20884 

21226 

21566 

21904 

22240 

0.6 

22575 

22907 

23237 

23565 

23891 

Em 

24537 

24857 

25175 

25490 

0.7 


26115 

26424 

26730 

27035 

27337 

27637 

27935 

2S230 

2S524 

0.8 

28814 

29103 

29389 

29673 

29955 

30234 

30511 

30785 

31057 

31327 

0.9 

31594 

31859 

32121 

32381 

32639 

32894 

33147 

33398 

33646 

33891 

1.0 

34134 

34375 

34614 

34850 

35083 

35313 

3oo^3 

35769 

35993 

36214 

1.1 

36433 

36650 

36864 

37076 

372S6 

37493 

37698 

37900 


38293 

1.2 

38493 

38686 

38877 

39065 

39251 

39435 

39017 

39796 

39973 

40147 

1.3 

40320 

40490 

40658 

40824 

40988 

41149 

41308 

41466 

41621 

41774 

1.4 

41924 

42073 

42220 

42364 

42507 

42647 

42786 

42922 

43056 

43180 

1.5 

43310 

43448 

43574 

43699 

43822 

43943 

44062 

44179 

44205 

44403 

1.6 

44520 

44630 

44738 

44845 

44950 


45154 

45254 

45352 

45449 

1.7 

45543 

45637 

45728 

45818 


45994 

46080 

46164 

46246 

46327 

1.8 

46407 

46485 

46562 

46638 

46712 

467S4 


46926 

46995 

47062 

1.9 

4712S 

47193 

47257 

47320 

47381 

47441 


47558 

47615 

47670 

2.0 

47725 

47778 

47831 

47882 

47932 

47982 

48030 

48077 

48124 

48160 

2.1 

48211 

48257 



48382 

48422 

48461 


48537 

48574 

2.2 

48610 

48645 

48679 

48713 

48745 

48778 

48809 

48840 

48870 

48890 

2.3 

48923 

48056 

4S983 


49036 

49061 


49111 

49134 

49153 

*2.4 


49202 

49224 

49245 

49266 

49286 

IB 

49324 

49343 

49301 

2.5 


49396 

49413 

49430 

49446 

49461 

49477 

49492 


49520 

2.6 

49534 

49547 

40360 

49573 

49585 

49598 

49609 

49621 


49643 

2.7 

49653 

49664 

49674 

49683 

49693 

49702 

49711 

40720 

49728 

49738 

2.8 

49744 

49752 

49760 

49767 

49774 

49781 

49788 

49795 

49801 

49807 

2.9 

3.0 

3.5 

4.0 

4.5 

6.0 

49813 

49865 

4997674 

4999683 

4999966 

4999097133 

49819 

49823 

49831 

49S36 

40841 

49846 

4985 1 


49S61 


* This tiible ))as been adapted, by permission, from F. C. Kent, “Element of ritatislic^’* 
(McGraw-Hill Book Company, Inc., 1924). 





























APPEXDIX 


mb 


Table VIII. — HA'i-Ennoi-if TvxfjEXTs^ 



r = tanli 2 

1 

1 - 

r as tanh z 

/ , 

r =» tauii z 

0.00 

’ 0.00000 

] 

0.55 

0.50052 1 

1. 10 1 

n 

0. 10 

.01000 


0.5G 

.50798 

1- 11 

ftnior. 

0.02 

.02000 


0.57 

.51536 { 

1.12 


0.03 

.02999 


0.58 

.52*267 1 

1.13 ! 

81 lOO 

0.04 

.03998 


0.59 

,52990 j 

1.14 

.81441 

0.05 

0.0499G 


0.00 

0.. 53705 

1.15 1 

0.81775 

o.oc 

.0.399.3 


O.Cl 

.54413 

1.16 

.82104 

0.07 

.00989 


0.62 

.5.3113 

1.17 

*82427 

o.os 

.07983 


0.63 

, 5-3805 

1.18 

.82745 

0.09 

.08976 


0.64 1 

.50490 

1.19 

.S305S 

0.10 

0.00007 


0.65 

0.57167 

1.20 

0.83365 

0.11 

.10950 


0.00 

.57830 

1.21 

.83668 

0.12 

.11943 


0.07 

. 58198 1 

1.-22 

-83965 

0.13 

.12927 


0.68 

..39152 1 

l.*23 

-84258 

O.W 

. 13909 


0.69 

..59798 ] 

1.24 

.84.546 

O.lo 

0.14889 


0.70 

0.60437 ! 

1.25 

0.84S28 

O.IG 

. 1580.3 


0.71 

.01008 1 

1.2G 

.85106 

0.17 

.10838 


0.72 

.01091 I 

1-27 

. 85380 

O.IS 

.17808 


0.73 

.02307 

i 1.28 

.85648 

0.10 

. 1877.3 


0.74 

.02915 

1.29 

.85913 

0.20 ! 

0.19738 


0.7.3 

0.63.315 i 

1..30 

0.SG172 

0.21 

.20097 


0.70 

.01108 1 

1.31 

.80428 

0.22 

.216.32 


i 0.77 

.04093 ! 

1.32 

.80678 

O.Ki 

.22003 


1 0.78 

.65271 ! 

1.33 

. 80925 

0.2i 

. 23350 


0.79 

.65841 i 

1 1.34 

.87167 

0.2.5 

0.24492 


0.80 

0.60-104 

1.35 

0.87405 

0.2G 

.23430 


0.81 

.66959 

1.30 

87039 

0.27 

.26302 


0.82 

.07507 

1.37 

. 87869 

0.28 

.27291 


0.83 

.08048 ! 

I 1.38 

.88095 

0.29 

.28213 


0.84 

.68581 

1.39 

.88317 

0..30 

0.29131 


0.85 

1 0.69107 

1.40 


0.31 

! .30044 


0.80 

1 .69626 

1.41 

.88749 

0..32 

.30951 


0.87 

1 .70137 

1.42 

. 88960 

0.33 

.31852 


0.88 

.70042 

1.43 

.89107 

0.34 

.32748 


0.89 

.711.39 

1.44 

.89370 

0..3.5 

0.33038 


0.90 

0.71030 ! 

1.45 

0.59569 

0.30 

.34521 


0.91 

.72113 

1.4G 

,89705 

0.37 

.35399 


0.92 

.72590 

1.47 

.89958 

0.3S 

.30271 


0.93 

1 .730.39 

1.48 

.90147 

0.39 

.37130 

1 

0.94 

.73522 

1.49 

.90332 

0.40 

0.3799.3 

1 

0.95 

0.73978 

1.50 

0.90515 

0 . a 

.38847 


0.00 

.74428 i 

1.51 

.90091 

0. 12 

. 39093 


0.97 

.74870 

1 1 . 52 

.90870 

0.43 

. 10.532 

1 

0.98 

.75307 

1 1.53 

.91012 

0.11 

.41304 

5 

0.99 

.75736 

! 1.54 

.01212 

0.45 

0. 12190 

\ 

1.00 

0.701.59 3 

i 1.55 1 

i 0.91.379 

0.4G 

.43008 

i 

1.01 

.70570 « 

1 . 56 1 

[ .91542 

0.47 

.4.3820 


1.02 

. 70987 5 

1.57 1 

S .91703 

0.48 

. 1402 1 


1.03 

.77391 ; 

1.58 1 

.91800 

0.49 

. 15422 

1 

1.04 

.77789 

1.50 1 

.92015 

0.50 

0.40212 

1 

1.05 

0.78181 

' 1. 60 

U.921G7 

0.51 

.40995 

1 

1.06 

.78500 

1.61 1 

92310 

0..')2 

.47770 


1.07 

.78940 

1 .02 

.92402 

0..j3 

.48.3.38 


1.08 

7932IJ 

1 . o:; 

92000 

0.51 



1 . 09 

79088 

1.01 

.92747 


1 rioiirfu: Hougmax, Charges C., Malhematicul Tnblci from ilfindhooh of Ctiemintry and 
rUnxics (104l>. 



0)G I ! I MLMAin hlAlIbllCS 1V« UillCAIlOSb 


J Abi \ III — llYi-kiii oi u 1 {Cirnlnitl) 



,-t h 


- t» 1 » 


- Unb * 






1 c 

I CO 

1 rr 

l B» 

1 09 

I 70 

1 72 

0 92880 
93022 

J31 j 
0328r 
9341 

03 ri 
03780 

2 20 

2 21 

2 22 

23 

2 24 

2 20 

2 7 

0 07 74 
97622 
97C08 
97714 
j77 D 

0 978(3 
0784 

9 888 

2 Ta 

1 70 

2 77 

2 78 

2 9 

2 8(1 

2 81 

2 82 

0 09166 
J92U2 

99233 

0 9020.3 
9-278 

99 92 

1 74 

I 

1 G 

1 77 

1 78 

1 7J 

I 80 

\ ai 

1 82 

1 M 
> 84 

94023 

0 04138 
042uO 
mi 

044 0 
94o7( 

0 94r8I 
94784 
94884 
94983 
S,}08( 

2 9 

2 30 

2 31 

2 32 

2 33 

2 34 

< 

2 30 

2 37 

2 38 

2 39 

9 070 

0 8(10 

98 87 
98121 
981GI 

0 08107 
98233 
98207 
98301 
9833 

2 84 

2 S 

2 60 

2 8T 

2 88 

2 80 

2 “U 

2 01 

2 92 

2 93 

2 94 

99320 

0 99333 
0934G 

9 ij' 

99372 

99384 

903 

04U8 

>9420 

99431 

9044} 

8 

1 80 
] 87 

I 88 

1 69 

0 17 

9 2 8 

B 350 

9 440 

9j 37 

2 40 

2 41 

2 42 

2 44 

0 083 7 
981^0 

98402 

08402 

2 J 

3 90 

2 or 

08 

2 00 

91.ll 

9404 

9947 

9948 
9O40S 

I 9U 

1 91 

1 02 

1 93 

1 04 

9 70' 

0 792 

9 873 

9 0 3 

2 40 

2 47 

2 48 

2 49 

98«22 

98 1 
9&J79 
08C07 
9803 

3 (1 

3 2 

3 3 

3 4 

00 jO 
J9(>6S 

097 8 
0777 

1 0^ 

I JG 

1 97 

1 98 

1 K> 

■» 01 

2 0 

2 01 

2 04 

2 Oj 
■> Ofl 

2 07 

0 Of 32 

floy 

9618 

9f7 9 
06331 

9C4 1 

473 

Of 41 
60(9 
0673 

00740 

9080.3 

0 80 

2 <0 

2 ol 

2 J.3 

2 

2 G 
■’ 7 

2 8 

2 fO 

61 

2 62 

0 98Wli 
08<88 
08714 
0873 

08 84 

n 98788 
8612 

883 

88 8 
J8SSI 

0 OS-HIS 
98921 

089 ir 

3 

3 6 

3 7 

2 8 

3 0 

4 0 

1 i 

4 3 

4 4 

4 7 

0 00818 
0*9851 
00878 
99900 
00918 

0 0 933 
0994J 

90 

099 4 
99970 

1 9997a 
09980 

2 03 

2 00 

2 10 

2 11 

2 12 

2 13 

2 14 

2 Ij 

2 10 

2 17 

2 18 

2 IJ 

90020 

OSf 

0 01 
97103 
7>i!> 
7213 

97 CO 

0 9 3 3 
717j 

74 

7177 

7 20 

2 03 

2 Cl 

2 nr 

S b7 

2 C8 

2 09 

2 V 

71 

2 7 

2 73 

2 74 

98 87 

9 >07 
09020 
aWMrf 

‘’'***gj 

0 JOIOI 
19118 

13 

r 1 3 

9* 17 

4 8 

4 0 

0 U 

9 989 

091 



AUTHOR INDEX 


A 

Agncw, H. T'l, 9 
Andpison, O. X., o7 

B 

Barker, P. W., 77 

Barlow, P., 209 

Barrows, H, K., 17 

Baumann, A. 0., 624 

Beckhart, B. H., 518, 575, 617-018 

Bergen, H. B., 16 

Berridge, W. A., 619 

Binder, Rudolph, 11 

Blume, Johannes, 249 

Bowditch, H. P., 322 

Bowley, Arthur L., 191, 552 

Boyd, Anne Morris, 65 

Bratt, Elmer C., 662, 603 

Brigham, Carl C., 10 

Brown, William A., 617 

Brunt, David, 294, 561 

Bryan, W. L., 322 

C 

Campbell, P. L., 15 
Campbell, Leon, 13 
Carmichael, F. L., 524, 562 
Carver, H. C., 190 
Chaddock, R. E., 174 
Cheyne 3 f, Edward P., 25 
Chugerman, Samuel, 11 
Clare, George, 617 
Conklin, Maxwell R., 531-532 
Cook, H. B., 17 
Copeland, A. R., 250 
Cowden, D. J., 174, 578, 633 
Cox, G. V., 662 
Cramer, H., 236 


Crcagcr, W. P., 17 
Crowder, W. E., 176 
Croxton, F. E., 104, 174, 578 
Crum, W. L., 561 
Cummings, John, 71-72 
C/aiber, Emanuel, 297 

D 

D’Abro, A., 21 
Dalj', Patricia, 658 
Davenport, D. H., 63 
Davies, G. R., 176 
Davis, Michael M., 16 
Day, E. E., 663 
De Broglie, Louis, 19 
Dewe.v, Edward Bussell, 653 
Dewey, John, 22 
Dieulefait, Carlos E., 503 
Director, Aaron, 619 
Douglas, Paul H., 619, 658 
Dublin, I>ouis I., 551 
Duffendack, O. S., 19 
Duncan, A. J., 165, 193, 242, 283, 
287, 294-297, 300, 302, 301, 
307-310, 315-319, 331, 492, 653 
Dj-gert, W. B., 9 

E 

Eddington, Sir lAlhur, 5, 18-19, 
21-22 

Ehrenberg, R., 651 
Elderton, W. Palin, 196 
Eldridge, J. A., 19-21 
Ezekiel, M., 12 

F 

Fairchild, Homy Pratt, 27 
Falkner, Helen D., 623 



C'JS IIF\H:NT\.UY STAlIhlW.S .IV/; .l//'/./C.l77f;,V.S 


1 ilkn.r, l!..lm.l P 'H. 

I lulkmr. II I , 2*. 

1 Hire, liiunKl, 25 
tVld, Jumps A , 131 
nne, H H , 481, 493-4&4 
Hspher, LutIwiK, 156 
Fisher, IrMiig, 131, 530 
lishcr, U ^ 4, lo9, >l>3, 005, fill 
Fosilick, Ua\ monel 13 , 122 
1 ouricr, t M C , 561 
1 rechet, 5Iauricc, 236, 250 
I reeman, H \ , 18 
I ricke\, Edwui, 5 >4, 006 
Frisboc, Ira \ , 024 
J r\ , '1 C , 249 

G 

Gaiiup, George, 6 

Ciulton, Sir Francis, 14, lOft, 293, 
323-324 

GarReld, Irank II , 530 
Garrett, H K , 11 
Gatiss, ICarl Fncdrtch, 295, 331 
Geary, R C , 305-306 
Gill, Corrmgton, 27 
Gilmore, F F , 9 
(.ipcf, 0 V,575 

II 

llu IS, Vrtlnir, 19 

Hamilton, Mcxanilei (il7 

Himiltoti, 4\iUi'ini, 0*2 

Hanej, L 11 , 003 

Hansen, AImh II , 019 

Hardy, Cliarlcs 0 , 602 

Harrison, G. IL, 9, 13, 21 

Hart, W L., G23 

llavford, F L., 8 

Ha>nes, B P , 77 

Hill, \ Bradford, 5, 16 

Hinncks, A F , 50 

Hoelgman, C C , C95-696 

Hoffenberg, Marvin, 007 

Hogben, Lancelot T., 101 

Holbrook, Stew art H , 9 

Holthiiiscn, Duncan McG , 638, W 4 


n<H.>k<r, II H,.t3l, Ifrfl 
Ifublwr.1 C W, 17 
Hunt, P , 575 » 

Huntington, E. V , 681-634 

J 

Jay, Vrynoss, 624 
Johnson, Norris Q , >31 
Joniin, Cluilis, 563 
Justin, J D , 17 

K 

Tvellei, Iriimm, 183 
Keinnierer, L W , 530, CIS, G2d 
Kent, 1 C , 693-094 
King, W 1 , 516, 557, 561, 025 
Kniblis, George H , 549, 552-553 
KolinogorolT, A , 236 
Kondraticfl, S , 557 
Kossons, M D , 7 
I\rutcli,J \V,27 
Kuezynski, Holiert V , 551 
Kurncts, Simon S , 554-357, .570, 
621 

1 . 

Henry \ , 633 
I,oaven3, Diekaoa H , 156 
Ix-gendre, Adrian Mane, 212, 351 
l«\y, U, 242-243 
l/*\\is, Mabel S , 007 
Lexis, tv HU, 570 
I.ilschuts, H , 19 
Ixilka, Alfred J , o51 
Lynd, II M , 11 
Ljud, R S, 11 

.M 

Malisoir, William M , 19 
Maynard, H B,9 
McCabe, David X., 620 
Miscs, Richard von, 245-251, 269 
Mitchell Wesley C, 67, 100, 131 
493-500, 513-514, 522, .535, 
553-5)4, 559, 565, 617-618 



iUTIlOli INDEX 


6<Jl) 


iMoorc, Geoffrey 11., 660 
Moore, Henry L., 561-562 
M 3 'ors, Margaret (!,, 617 
Myers, Robert J., 657 

N 

Xagel, E., 2-12, 244, 246-247, 250 
Xeynian, J., 236 
Norton, J. P., 617-618 

O 

Oparin, D. T., 557 
P 

Peabod}', Leroj’ E., 553 
Pearce, Thoma-s V., 153 
Pearl, Raymond, 16, 549-551, 578 
Pearson, E. S., 18, 305-306 
Pearson, Frank A., 497 
Pearson, Karl, 209, 293, 323-325 
Persons, W, M,, 66, 500, 530, 547, 
565, 623 

Planck, Max, 19-20 
Poj'nting, J. H., 560 
Prescott, Raj'mond B., 553 
Prctoriu», S. J., 190, 324 

Q 

Quotelp', A., 87, 549-550 
R 

Reed, L. J., 549-550 
Renooj', D. C., 63 
Ricf, William B., 657 
Rirgway, R., 14 
Rietz, H. L., 181, 196, 561, 623 
Riggleraan, John R., 624 
Robb, Richard A., 623 
Robinson, G., 573, 592 
Roe, Anne, 14 
Romanov.skj’', V., 563 
Ross, Frank A., ’584, 637 
Roth, L., 242-243 


Ruark, A., 19 
Rugg, Harold O., 325 
Rutherford, E. R., 20 

S 

Schehifcld, A., 15 
Schell, E. H., 9 
Schmcckcbicr, Laurence F., 05 
Schneck, M. R., 11 
Schultz, Henrj', 12, 675 
Scott, F. V., 63 
Sheppard, W. F., 298-299 
Shewhart, W. A., 17-18, 196, 2 IS, 
657 

Shields, Murraj', 518 
Shiskin, Julius, 033 
Silbcrling, Norman J., 554 
Simonton, W. A., 15 
Simpson, C. G., 14 
Smith, .Vdam, 558 
Smith, Edwin S., 619 
Smith, G. R., 77 

Smith, J. G., 165, 193, 242, 283, 287, 
294-297, 300, 302, 304, 307-310, 
315-319, 331, 492, 617, 653 
Snedecor, G. W., 15, 011 
Snider, Joseph L., 535 
Snyder, Carl, 534 
Socrgel, W., 14 
Sogge, Tillman M., 619 
Stebbing, L. S., 19-20 
Stein, Harold, 104 
Sterne, T. E., 13 
Stevenson, T. II. C., 552 
Struik, D. J., 236 
Stuart, C. A. Verijn, 89 

T 

Thompson, H. D., 481, 493-491 
Tinbergen, J.. 12 
Tintuer, Gerhard, 570, 600 
Tolman, R. C., 20 

U 

Uhu, Aaron Hardy, 551 



701) hhl.^ILM' \Il\ iir\lisrlti> AXU M’VUCUiO^S 


V 

\onn, I, 2J7 
\ orliiil't, I’ 1' , 5-'<0 

W 

\\ igciinnti, Lrns), 5o0 
W il.l, \ , 250 

Walker, Helen XI , 25, 19J, 323 321 
WaUn, W AlUn, faOO 
Waller, I 11,9 
W anlcll f \ n , 557 
W irieii, {.eorge 1 , 197 
Wjut,li \ L, 033-992 


Weldon, W 1 IL, 3’t 
We^t, Xluhiel, 9 
WliiUakcr, I 1 , o73, '.'Ji 
WiUunis, C n, H9 
\\»Vwn, 1 li , A94 
W ixKllicf, Thomas, >31~")32, 021 
W nght, Carroll D , 78 


\ule, O r.liij, 321, 540-551 


Zcucier, 1 1 1 



SUBJECT INDEX 


Accuracy, in calculating statistics, 
230^231 

Agricxillurnl Situation, SO 
Agricultural Statistics, 80 

U.S. Department of Agriculture, 
80 

liureau of Agricultural Kco- 
nomics, 80-81 
Agricultural Situation, 80 
Agricultural Yearbook, 80 
Crops and Markels, 80 
American Bankers Association, 51 
Analysis of variance, in mutliple cor- 
relation, 422-429 

in nonlinear correlation, correla- 
tion index, 395-396 
correlation ratio, 373-370 
in simple correlation, 352-353 
Annalist, 534 

Arithmetic charts, 129-131 
Array, 139-140 
Asymmetry (see Skewness) 
Attributes, variable, 157 
Averages {see Frequency distribu- 
tions, averages) 

Avogadro’s liiw, 57 

B 

Banking statistics, sources of, 79- 
86 

'Federal Ileserve, Board of Gover- 
nors, 82 

F ederal Reserve Bulletin, 82 
Member Bank Call Report, 82 
Kational Monetary Commission, 
83 

Slalesman’s Yearbook, 80 


Banking statistics, U.S. Trca,sury 
Departinent, 79 
Abstract of Condition of National 
Banks, 79 

Bar charts, 104-105 
Bayes, T., 242 
Bernoulli, Daniel, 242 
Bernoulli, Jacques, 242 
Beta coefficient, 192-193 
Beta cross-product term, 425 
Biennial Census of Manufactures, 
537 

Binomial distribution, symmetrical 
(see Symmetrical binomial dis- 
tribution) 

Bivariate frequency distribution, 
325-353 

first-order standard deviation, 
relation to r, 351-353 
illustration of, 325-327 
table, 326 

joint variation illustrated (bivari- 
ate scatter diagram.s), 339, 
343, 345 

methods of summarization and 
comparison, 327-353 
Pcarsonian coefficient of correla- 
tion, 338-349 

analysis of variance, 352-353 
calculation of, 347, 349 
progrc-ssions of means, 328-329 
illustrated (graphs), 328-329 
Bivariate frequency surface, 471-480 
bivariate histogram, 469-471 
illustrated (three-dimensional 
diagram), 470 
independent variabl&s, 471 
lines of regression, 486-488 

mathcznatic.al representation, 
487-188 

nonnormal, 491-492 


701 



■|)J ! ! I Ml \ ! MtY aTATIttIK S 1\/; WV’/ /f 1 7 //J \ S' 


Kiviiiilc rr(.quuic> biirfut, luui 
normal, produet>moinont for 
miila for r, casoa for use or 
nonuse, 491-492 

norinjil, dependent variables, 477— 
4S6 

denvation of equation, 481-48(5, 
492-496 

equation of rotated ellipse, 481- 
482 

horizontal cross section, 481 
horizontal view (graph), 477 
illustrated (graph), 480 
mathematical reprcscnlatioij, 
432-486 

rotation and narrowing with 
tx>rrclation, 488-481 
\ertical \icw, 478 
ixonnal, ladependcnl ^anahlcs, 

472- 476 

circular form with equal stand 
ard doMatioDS, 470-470 
illustrated, 472 
normal cur\c from which <1<^ 
nved (graph), 473 
clhplieal form tvith unequal 
standard deviations 47t> 
horizontal set timi 470 
illustrated (graph) 47^ 
tnalhcmatical representation, 

473- 476 

liivariatc histogram, 4G9-47J 
illustrated (tlircc*dinicnsional dio,- 
gram), 470 

Bivariate scatter diagram, 303 
Bivariate senes, 149-134 
Boltzmann, L , 19 
Boscovich, U G , 242 
Bojle’s law, 19, 0a2 
Bradstreet's index, 522 
Bureau of I orctgn and Domestic 
Commerce, 56, 76-77, 312, j3o 
Bureau of Home Economics, 48 
Bureau of T,abor Statistics, 7, 42-30, 
34, 500, 535, 338 
indexes, 517, o25, 527 
Biueau of Afines, 79 
Business barometer (see Indexes) 


( 

fartograms, 112-121 
by bars, llS-119 
b> colors and shades, 121 
by cross-hatching, 116-117, 121 
by dots or points, 112-\l'j, U7- 
118, 120-121 
Cliarlcs* law, 57 
Charlier check, 209, 35t-33"» 

CTiarts, 100-121 
arithmetic, 129-131 
bar, 104-105 
bivariate, 150-154 
component-bar, 106-107 
cross-hatched zone, 107-109 
of frequency iliatnbutions, 143- 
149 

frequency polygon, 143-147 
histogram, 147 
on a ratio scale, 147-1 10 
piclogram, 102-103 
ratio, 131-137 

logarithms m lelalion to, 133- 
137 

sectors of circles, 104, 109-113 
split-bar, 110-112 
of time senes, 123-138 
lime senes iii rehtiv es, 130 
t}pes of, 101 

Chi square (x*) curve, 300 
Chi square (x*) test of goolncs-i of 
fit, 300-305 
the X* curve, 300 

critical V alucs for ^ , 

(table), 304 

weaknesses of test, 305 
Cbas uvterval, 144, VG4 
Clasoical concept of probabilitj, 
242-247 

Coefficient, confidence, 311-312 
moment, ISO 

of multiple correlation, 398, tl6- 
418 

of partial correlation, 418-422 
ofmk 310-311 



SUBJECT INDEX 


703 


Coefficient of correlation, arithmeti- 
cal view of, 339-347 
Charlier check, 354—355 
computation from grouped data, 
357-362 

short method, 357, 359-362 
tabulation of given data (table), 
356 

computation from ungroupcd 

data, 354^357 
work sheet (table), 355 
distinguished from correlation 

ratio, 365 
first-order, 422 
order of, 422 
Pearsonian, 339 

relationship to line of regression, 
349-351 

second-order, 422 
third-order, 422 
zero-order, 422 
Combinations, 233-236 
binomial expansion in, 234-236 
defined and illustrated, 233-234 
Combinatorial analysis, problem in, 

' 270-283 

Commercial and Financial Chronicle, 
70 

Commercial statistics {see Sources 
of statistical data, commercial) 
Commodity Yearbook, 71 
Component-bar charts, 106-107 
Confidence coefficient, 311-312 
Confidence interval,' 313 
Consumers’ Incomes in the United 
States, 48 

Correlation, applications of, by 
social scientists, 324r-325 
best way of studying, 365 
bivariate frequency table, 357 
coefficient of, Pearsonian, 339 
zero-order, first-order, second- 
order, etc., 422 

multiple {see Multiple correlation) 
nonlinear, 305-396 

{See also Curvilinear regiv.;- 
sion) 


Correlation, origin and development 
of measurement of, 322-324 
partial {see Partial correlation) 
progress in discovery of, 321-322 
ratio, 365-376 
simple, 321-364 

Correlation coefficient {sec Coeffi- 
cient of correlation) 

Correlation index, 394^396 
Correlation ratio, calculation of, 
368-373 

explained, 365-368 
Cournot, A., 12 
Covariance, 405 
Coxe, Tench, 72 
Curve, error, 294-295 
frequency, theoretical significance 
of, 162-166 

Gaussian error, 194, 294r-295 
growth vs. frequency, 149 
normal, characteristics of, 265 
formula for, 263-267 
method of fitting to sample 
histogram, 299-300 
normal frequency, 232-320 
characteristics of, 265 
formula for, 263-267 

{See also Normal frequency 
cmwe) 

probability, 254 
of regression, 367 
standard normal, characteristic of, 
266-267 

formula for, 267 

Curve fitting, curvilinear regres- 
sions, 376-397 

fitting normal curve, 299-300 
fitting trends to time scries, 564- 
616 

Curvilinear regression, calculation 
of, 376-394 

correlation index, 394-396 

and analysis of variance, 395-396 
correlation ratio, and analysis of 
variance, 373-376 
calculation of, 368-373 
work sheet (table), 371 
e.xplaincd, 365-368 



70J 1 ! LMLNtAUY 62 12267266 1 V22 MiUCHWiSb, 


Curvilmc ir rtgrtssioii, fstini itc& 
bihisd on jrgrcK&ioii equations 
3S8-390 

iHuaVraVcd, by bwansvlc scatter 
dugraiu and fitted ciirvi^ 377 
itlafioiiahi]) in ioguriUiiiiic forui 
(graph), 378 

nUtionship in icctpioeil fonii 
(giaph), 381 
loganlbinic, 377-38U 
illustrated (graph), 370 
pr ictical oslimntcs based on 
equation derived 388-300 
Blruidanl erior of utiniate, cal- 
culated 390-391 
traii&fonimtioii of problem into 
ainiplo linear eurit) itiou, 370- 
3S0 

]) irabolic, 383-388 
(iiiv( fill* d direitly, 381 
Doolittle method for soKing 
three <quatiouii, 3BW8S 
Viorh shell (tablo), 380 
liniLtieil estiniat<8 b'tscd on 
I ipiiition di rived, 3SS-390 
stiiMcIard ordir of csUinalc, enl 
«uUt(d 390-394 
M iproiid 381-383 
illiistniti d (grjpln, 3Si 
jriKtical CBtiinulia bused on 
I quiition derived, 388-390 
“•taiidjxd error of estimate, eal- 
eulalcd, 390-391 
Irunsfonnatioii to siniplc linear 
eorrclntiuii, 331-333 
stuiulurd error of cstiiiiAie, 390- 
391 

caltulaad, 391-393 
ililTcrcnces for three tyjwa of 
regression, 393-394 
i?rs(-cnfcr s^indurd Jaulion 
used as, 390-391 
summarized with practical esli- 
iiiutrs (table), 393 
ti»e Ilf J’c-arscmi m coelliuuit of 
lorulatiuii, m logantlinnc 
ipproach, 379 

in iceiproeal ipproach 3S2 


f jile detenimiution, 037-C50 
HI annual data, 637-C42 
annual trend analysis (tiiblc 
and graph), 6-10 
iveliiul inovLinents shown 
(table), 041 

ma;or cycle and ejtle with 
icsiduals (graph), 042 
danger in extrapolating trends, 
648 

major cycle, 641-642 
method of ratios vs method of 
difforcnecs, 048-050 
m montlilyr data, adjustment re- 
quired, 037, 042-044 
danger m extrapolating treude 
C48 

iiietiiod of ihUnnining cyih 
illustrated. 011 017 
'V hero trend is i seeon J* or third 
degree polynomial, 047-0 18 
uorh sheet (table) 043 
Cy«l<s,015, 037 030 
nn ilysis by empiric d tieniU, 391- 
308 

o^vo-Uke, 332 

]) 

Dalii,cuinuhti\uvs noneumul itivi, 
127-128 

galiicriiig of, 2i-ol 
coiislniction of qmstioim iin s 
or schedules, 30-12 
rational basis (or, 28-29 
sampling, 42—19 
units of description and nuns- 
urement, 28-18 
(6€e uUo Questiotmurcs 
Sihctluhs) 

•jourcca of («« Sources of stivli^U 
eol d ita) 

tluoo types of statistieal, 4-0 
De Moivrt, \ , 212 
Density fiiuelion, 489 
m description of imiltivariale 
divtnhutums, 488—190 



SVHJECl' I KDLX 


70.) 


lJfliuiuin:itioii of )\oi iiuiliU (m. 

Xormahty) 

Dowey, .John, 22 

Directory of Federal Stalilind 
Agencies, 64 

Di'^fribution, of freqiienoj (see Fre- 
quency clLStnbutions) 
of probability (see Prohabilifj 
clLstnbutions) 

.symmetnoal bmoinial (sre S\ni- 
mctrical binomial dislribn- 
tion) 

Domesday Book, 2j 
Doolittle IS oik sheet for curvilinear 
correlation, 384, 386-388 
Doolittle work sheet for curvihneai 
rcf^res-sion, siork sheet (table), 
380 

i; 

Eddington, .Sii Arthur Slanlcj, 
18-19, 21-22 
Einstein, Albert, 21 
Empirical trends, .5S2-.598 
analysis of eyeles by, .594 -,j 9S 
straight-line and third-degree 
tremls with raw data, illus- 
trated (graph), 597 
tontlu-sions from tiends dc- 
lived, 598 

work sheet for trend and index 
of normal (t.ible), 594 
work sheet foi trend values, 
method of hmte differences 
(table), 597 

finite differences method for trend 
xaluc,s, 589-594 

aid for computing finite differ- 
ences at i = 0 (table), 590 
building up a pohnomial 
(table), 589 

d.ingei of cumulative error, 
593-594 

maximum cumulated errors 
(table), 593 

worksheet (table), 4,592 
polynomial, 5S3-594 


Liiijiirii .il (leiul-'. jioImioiiii il, (iiiii- 
omy of calculation in work 
sheet, 583-589 

economical work sheet, alge- 
biaic illustration (table), 

585 

economical w ork sheet, arith- 
metical illustration (table), 

586 

work sheet for second-degree 
polynomial (table), 588 
s(r.aight-hne trend, 582-583 
work sheet for index of normal 
and trend, 582 

I'miracration, districts, U.S census, 
35 

problems of, 28-42 
Enumcratois, directions to, 35-39, 
42, 44 

training of, 30, 42, 44 
typical problems facing, 29 
Equiprobabiiity, ellipsoids of, 4S9- 
490 

Erroi curve, 294-295 
Erroi, standard, of estimate, 383 
for statistics where s.imphiig 
distribution approximates 
normal curve (table), 320 
Estatlislica, 90 

Kstimatc-s, nianufactureis, 7 
Euler, L , 2-12 

Extrapolation of treniLs, d.angei in, 
648 

F 

Federal Reserve Bank of New \ ork, 
517 

Monthly Review of Credit and 
BnsineJis Conditions, .534 
Fedeial Re&er\e Board, 500, 532 
index, .533 

Federal Reserve Bulletin, 86, 531-532 
Federal Reserve System, 512 
Federal statistical agencies, 71-84 
(See also Sources of statistical 
data) 

Financial statistics, sources of (see 
Sources of statistical data, 
fin.im ei!) 



7()(» ii i^Mi \ i \ny wiiisur'f i\/> iiiow', 


Huiti, dilTiniK-o, iiicihod uf iuuliui; 
trend •\ allies, 589-5M 
danger of cumutativo error, 
593-5&4 

First-order standard deviation, do- 
fioition, 390 
1 ishcr, Irving, 518, 530 
1 orccasting, G51-6S0 
agencies 661-002 
Babson, 601, 671-672 
Brookinire Lconomic Society, 
601, 671-672 

Harvard Fconomic Socict>, 
061, 071-673 

Moodv'a Investor's Service, 
001, 071-072 

Standard & Poor’s Corporation 
661. 671-672 

nneient origin of psoudo-scicntific, 

051 

combined seasonal and codicil, 
077-678 

illustrations of 67$ 
tommcrcisl uses of 661-0G2 
Gjdos vnth timo senes 661 675 
general business conditions, 
004-673 

business barometer 664, 666, 
670 

combination indexes, CGG- 
667 670 

crosscut anal} sis mcihod, 
664-605, 071-673 
historical analogy method, 
063-071 

indexes of national income, 
067-670 

indexes of ph)SicaI volume of 
production (Babson index), 
070-671 

lead lag dillicultics in fore- 
casting, 070 

types of indexes, 664-673 
particular lines of activitj, 673- 
675 

crosscut analjsis method, 675 
cnida historical analogy 
method, illuctraled, 673 


1 finxusling, t M li s Mith tiiiK Mrits, 
particular lines of activitv, 
c>dchypolliosi8for,674 077 
lead-lag rdatlonshl{)^, 674 
from distnbution studies, 656-661 
bivariitc distributions, 657-CoS 
errors of forecasts, 6o9 
inonovanate distributions, 656- 
6o7 

multivariate distnbutions, 65S- 
659 

inoilcrn scientific, 6o2-Co6 
conditional, 653-654 
illustrations of, 654-656 
popular dramatization of fore- 
casts, 6t>2-653 

(\uaUtativcvs quantitative, 654 
use of statistics m, 656 
ijuolity and clTcet of cconomio 
forecasting G7&-080 
with seasonal v anation, 675-678 
historical analogy, GTO-677 
trends with time senes, less exact 
forecasting, 6CO-OC1 
more exact forecasting, 656-660 
rorcigu trade statistics, sources of, 77 
rourict's theorem, oCl 
rrdchet, Maurice, 2o0 
licqucney concept of probability, 
247-249 

Frequency curv cs, definition, 631-164 
dcnvntion from histograms, 102- 

164 

formulas for, 203-267 

noniial {are Normal ficqucncy 

uses ol, 164^166 

in graduating observed data, 164- 

165 

as a norm, 1G5 

in sampling analysis, 1G5-166 
Frequency distribution snaljsis, 
numerical computation, iT^ 
231 

arithmetic mean, rule for, 214 
averages and vanabilitj, ilifli- 
cultic^in locating median and 
mode, 216-217 



SUBJECT INDEX 


707 


Frequency distribution analysis, 
numerical computation, beta 
coefficients, 216 
calculations, 216-227 
average deviation, 220-224 
histogram assumption in 
grouped data, 221-224 
mid-value assumption in 
grouped data, 220-221 
averages and variability, diffi- 
culties in locating median 
and mode, 216-217 
coefficients, of skeumess, 226- 
227 

of variability, 225-226 
measures of skewness, 225 
median and quartiles, 218-220 
mode, 217-218 
semiquartile range, 22-1-225 
construction of class interval, 
199-206 

determining the class inter- 
val, effect of too many 
intervals, 202 

interval size chosen to re- 
veal character of varia^ 
tion, 202-207 

illustrative material, distribu- 
tion with various class inter- 
vals (tables), 203-205 
' mean square deviation, 215 

moments about the arbitrary 
origin, 207, 211-212 
moments about the arithmetic 
mean, 212-214 

scatter diagram and graph, 201 
standard deviation, 215—216 
with unequal class intervals, 
228-230 

variability and skewness, graph- 
ic interpretation of, 227-228 
work sheet, 206-216 
C'harlier check for, 209 
entering the distribution, 208 
illustrated (.table), 2>6 
saving calculation, by obtain- 
ing moments about an 
arbitrary origin, 207 


Frequency distribution analysis, 
numeral computation, work 
sheet, saving calculation, 
in use of work sheet, 208- 
209 

by iLsing class-interval units, 
207-208 

theory, 158-198 
averages, 167-180 

arithmetic mean, 167-170 
concept of, as summary fig- 
ures, 179-180 
geometric mean, 173-176 
harmonic mean, 176-179 
median, definition, 170-172 
theory, mode, definition, 172- 
173 

basic formulas used in, sum- 
mary of, 198 

beta coefficients, 192-193, 195 
bivariate (see Bivariate fre- 
quency distribution) 
charts of, 162-164 
histograms, 162-163 
area histograms, 162 
relative frequencies in, 162- 
163 

determination of normality of, 
297-306 

frequency curves (see Frequency 
curves) 

kurtosis of, 193-190 
measurements of summarization 
and comparison, 166-182 
measures of variability, average 
deviation, 183-184 
quartiles of, 171-172 
range, 182-183 
standard deviation, 184-185 
variance, 166, 185 
moments of, 180-182 
the centroid, 181 
moment coefficient, 180 
pm-pose of, 181-182 
multivariate (see Multivariate 
frequency distribution) 
of populations, 166-167 
parameters of, 167 



708 IUMl\l\lt\ SlAllSlIC'i AM) Al'l’IAtATlONb, 


Irtqucticy diiitributkoi) anabsi8> 
llirory possible ty)>cs of coiu 
panson, lo&-l62 
18 probability distributions, 2 VI 
of siUDjdc data, 107 
stiitistiLs of, 167 
sampling distnbutiona («<e 

Sampling distributions) 
slwcwnma of 18V-193 
Bjiiibuls used in, suniniarv, 107 
tnvuriatc (see Mutti\an&tr fre 
qiii iiey distribution, trt* 

\ an itc) 
ll^cof loS-130 

1 rcquciicy distnbutions 1-10-1 19 
Loii><.iitiuiiul milliner ot graplimg 
140-U4 

disdcto \8 coiituiuoua, 1-13 113 
irrational lo6 

iiutiircuiid illustration of 110-113 
rational loj 107 

in-quint} polygon rdalivc slope at 
a given point eomputi d (grapbi 
288 

1 requenty series, 138-H3 
definition 138-139 

(See aUo Frequency disiribulion) 
1 requency surfaces, 400-102 
bivariate, 471-4SC 
biianate liistogram 460-471 
multivariate (density function) 
488-491 

1 rcqiicncy table 143 
1 imcUoiis compound intcnsl, 361 
202 

dcscouiit 263 
c\pliut 255-206 
exponential declining, 303-203 
rising, 201-202 

fojieJjouaJ jpjafjojusbipr. 2a5-2a7 
bypcibolic (table), C9>-690 
implicit, 255-256, 2o9-2C0 
joint, 253 
linear, 256 
nonlineir, 25fr-2o7 
simple, 2o7 263 

(■Sir alao Siinpli functions 
gidjilis ofl 


G 

Caltoii, bir Irancia, 14, 196 293, 
323-324 

(jJUssian error I ur\c, 191 291-29,; 
Geological Surs e\ VSG 
Gonipcrtz logistic eur%e, 551 
Goodness of fit, described, 287 
lUustritcd (graph), 288 
tcjjtof 300-305 

(«Sec also Clii sciuarc [x‘) test 
of goodness of 6t] 

(tiKerninctil Publications uml Thtir 
Use Go 

<.*niplw, (*<c also ( harts) 
of simple fiiiictiuiK 257-267 

(Sie aho biiiiplc funclions, 
graplH of) 

Grauiit, John, 6o-60 
Growth curve? 119 

(S<e also Hatioii il trends) 

< xpLutulion of V53 
Guides to sources 02 Go 
goecnimentd, Gl-O’i 

Dtrtclory of I ederal Slaiisliral 
Agettcits, 04 

Goternmeni Publnations and 
Their ( se, Co 

t S Coicmment l/onuaf, 01 
U S GovcmmciU Publications, 
05 ’ 

nongovernment il statistics 02 Gl 
liiindbooks and gtner d mdi \ 
material 03-04 
niaganiic indexes, 02 03 
f«uilds, early sources of stutistii-,, 


H 

HuiulbouXs o7 

Ilataurd College Obscr\atory, 13 
JIarv ird index, 605-666 
UeuKubcig W , 19-20 
Ilciaeiibcrg’s uncertainty measure- 
ment, 19-20 

Histograms, m frtqmney distnbii 
tioiis, 162-103 



tiUBJECT ISDEX 


700 


Hollerith, Herman, 55 

Hollerith tabulating machines, 55, 
72-73 

Huygens, C., 242 

Hyperbolic functions (table), 605- 
690 

Hypcrplanc of regression, -190-491 
I 

Index (o Dimness Indexes, An, 63 

Index chart, capital formation, 669 
consumer spending, 668 
Har\'ard, 665 

Indexes, adjustment to bench marks, 
535-542 

ideal conditions for stratified 
sampling absent, 535-530 
method of adjustment illus- 
trated, 53S-542 

monthly indexes adjusted to 
census figures (table), 540- 
541 

reasons for adjustment, 536-538 
of eon-elation {see Correlation 
index) 

of general business conditions, 
533-535 

Harvard, 665-066 
method of computation illus- 
trated', 528 

price indexes, aggregative, using 
given-year weights, 529 
of production, 82, 530-531 
quantity indexes, and bu-sincss 
barometers, 530-535 
stratified sampling in, 530 
of trade and production, 530- 
531 

computation of weights illus- 
trated, 532-533 
weighted by prices, 529-530 

■ relative series from time series 
(chart), 130 

U.S. Bureau of Labor Statistics, 
construction of indexes, 525- 
527 


Index numbers. 497-542 
application of sampling tccimique, 
513-514 

stratified sampling, 515-516 _ 
composite, 512-513 
stratified sampling in construc- 
tion of, 513 

coiLstruction of, aggregative 
method, simple, 522-524 
weighted, 524-525 
avcragc-of-rc!atives method, 
simple, 518-520 
weighted, 520-522 
methods in general, 518 
conversion of absolute to relative 
numbers, 500-511 
absolutes, 500-501 
relative parts of a whole, 509- 

511 

relatives, 501-503 
relatives using a base period in 
time series, 503-505 
presumption of normality in 
base selected, 505-509 
history of discovery' and use, 497- 
500 

simple, great %’arieti' in use, 51 1- 

512 

variety' of purposes of, 516-518 ■ 
Industrial statistics, sources of, 66, 
77, 83 

Bureau of Manufactures. 77 
The Economic Almanac, 67 
Industrial Commission, 83 
LQ., 10 

I nlcniational Slatistical Yearbook, 85 
luluitivc-axiomatie approach to 
probability'. 250-251 
Iron Age, 68 

J 

Journal of the American Stalisticul 
Association, 534 

K 

King, IV. I., 516 
Knlmogoroff, A.. 250 



710 KLEMhSrAUY A.YD AEELICAl JOS'E 


Kurlosis, 102, 193-190 
Kuzncis, Simon S, of growth, 
explanation of, 35-1-558 


I>.il>or statistics, sourtcs of Cl, 7-1- 
75, 84 

CoinmisHion on Industnil Uclv 
tions, 84 

Dci)artincnt of Ijibor, 01 

JJiircan of I,abor Statistics, 78, 
80 

MoiUldij Labor ifctitie, 78 
I^grange, J L , 242 
Lambert, J H , 242 
Ivaplaec, I* S , 242-243, 250 
I>a\\ of largo numbera, 239-240 
liCaguo of Natiouh, 88-91 
indexes, ^12 
pubhcstions, 512 

Ixinst squares, method of, to find 
lino of regression in bivariate 
frequency distribution, 331-334 
Legoudru, V 242 
Lino of regroasioii, 329-335 
becomes hypcrplane of regression 
in multivanatc distribution, 
490-491 

derived by method of hast 
squares, 331-334 
interpretation of 339-338 
inc'vns of rows and volumns 
(table), 330 

rclktioMship to r, 349-351 
standard dcM ition about means 
or line of regression, 339-338 
stindard deviations for cedumns 
of data given (table), 337 
of \ I on A',, 339-334 
illustrative diagram, 332 
of A'l on A’,, 335 
ilhistrativo diagram, 334 
liincar plane of regression, eccoud- 
order variances for, 413-416 
lanes of regression in bivarate fre- 
quency distribution, catculs*'^ 
from given data, after compu" 
ing r, 3G2-363 


lanis of regression, work of compu- 
lation in fitting to time senes 
when more Ilian two eoelfi- 
eicnts, 599 

Loei of cquiprubabibty, in mulit- 
vanatc frequency “surface,” 189 
Logonthmic charts (see Itntio charts) 
Isjgantbmic regression, 377-380 
Loganthms, of numbers, four place 
common (table), CS1-C84 
scale for ratio charts, 131-137 
Ixigistic grow til cun cs, (sec Ration il 
trends) 

M 

htagiizmo indexes, G2-C3 
MarlU Utiiofch Senes, 03 
Maximum likelihood, mclhod fur 
single best estimate of popuh- 
lion percentage m sampling, 
314-315 

Means, progressions of, graph, 32S- 
329 

yfeasuiemenl of Genual Lxchangt’ 
Value, The, 500 
Vinerefs KrarbooJ:. 58 
Miscs, Richard von, 245-251, 2C9 
Mitchell csley C , 500, 514, 530, 
553 

Dullness Cycles — The 1‘roblnn and 
Its Srttmp, 499, 518, 535 
lawr of growth, explanation of, 553 
Monlkty ItuUcUn of Stalislies, 85 
Motdkijf Labor Jhwew, 78 
Multiple correlation, 397-430 
analysis of variance in, 422-429 
coefficient of direct dclerinma- 
tion, 424 

illustrated {diagram), 424 
coefficient of joint deternunu- 
tion, 425 

coefficient of multiple correla- 
tion, 429-428 

coefficient of net regression, 
424-425 

--beta cross-product term, 424- 



SUBJECT im)EX 


7i 1 


Multiple correlation, analysis of vari- 
ance in, r beta cross-product 
term, illustrated (diagram), 
424 

residual variance, illustrated 
(diagram), 424 

analysis of variance and causal 
relationships, 42S-429 
coefficient of, 398, 41&-418 
definition, 397-399 
extended to any number of vari- 
ables, 43-^-436 

extension of formulas, high- 
order variances, 436 
multiple-correlation formulas, 
436 

partial-correlation formulas, 
436 

statistics for regression 
planes, 435-436 
general approaches, 43*4-436 
extension of analysis to four 
variables, 429-434 
multiple correlation coefficient, 
434 

partial correlation in four-vari- 
able case, 433-434 
in terms of correlation statistics 
of same order, 432-433 
in terms of lower-order correla- 
tion statistics of same kind, 
431-432 

in terms of lower-order r’s and 
tr’s, 432 

third-order variance, 434 
linear vs. nonlinear relationships, 
399-400 

notation used in, 401-404 

meaning of subscripts before 
and after point, 402 
sj'mmetrj'' of, 404 
partial correlation (see Partial 
correlation) 

Multiple linear regression, 410-416 
beta form of regression equation, 
410-413 

ontalned by method of least 
sqiiareC, 'rtO 


Multiple linear regression, beta form 
of regression equation, a’s and 
b's calculated from bet.a form, 
412-413 

■second-order variances for linear 
plane of regression, 413-416 
Multivariate frequency distribution, 
404-410 

analysis, illustrated, 437-468 
trivariate statistics, interpreta- 
tion of results, illustrated, 
analysis of variance in X, 
451, 455 

estimates based on regression 
equation, 450-451 
partical-correlation coeffi- 
cients, 451 

best approaches in studying, 409- 
410 

calculation of trivariate statistics, 
444-450 

all-round check on, 450 
equations of three planes of 
regression, as found, 448-449 
first-order correlation statistics, 
445-450 
a statistics, 448 
b statistics from the beta’s, 448 
coefficients of partial correla- 
tion, 447-448 

first-order beta's from zero- 
order r’s (table), 446 
interpretation of results, illus- 
trated, 450-451, 455 
multiple-correlation coefficients, 
449-450 

second-order standard devia- 
tions, 449 

zero-order correlation statistics, 
444-445 

cx.ammation of, 437, 442-444 
by testing net regression, 437, 
442-443 

trivariate, 405-410 

conditions for independence of 
all variables, 408-409 
illustrated (diagr.am), 405 



712 J-Lh Mi WAitr \IAJt.slJC.'> l\'D llJ’UCAJJi>\ 


Miilliviiri ill frt<iuiiui iliHtnliulic n 
Irnariatc, «tuilied b\ birakini; 
up into hivAriiitc dislnbutions, 
IOO-40S 

Irivari itc anal\8H illualfatcil, «or- 
rchtioii tallica of 400-107 
tomlalion table of YiAixI \., 
406 

A', and \,, 40ft 
Vi and A'l 407 

Multiviinatc frcqucncj wtrfacp, non 
normal distnbulion 401-492 
normal, 488-491 

(lev lalions nornntlK distnbutcil, 
400 

rITcct of Rrrvtrr corrolilion on 
ellipsoid sliapc, 490 
I llijMoida of eqiiipmhftbdily, 
489-491 

liUwtralod fgrapli), 490 
in rcahh a density function, 489 


National Hureaii of rconomic llo- 
seareb 07 

National Industrial fonfercncc 
Hoard, GO 

National Research PlannmR Board, 
piililii alions 84 

N itional Resources Committee, 48 
Ne>% Jersea State T^ibnr Dept , 'iiS 
New York Times, lie, "iSl 
Novitoiiian miehaiiits 19 21 
Ncyiiinii, N , 230, 250 
Nonlinear correlation, 305-390 

{See also C urvilmcar regression) 
Nonnomiaht) , in hivuriiteor niulli 
aanitc clistribulioiis, 491-492 
of population in sampimg, 316 
Nornud frequency curve, 232-3^ 
algthruc and grapliic representa- 
tion of, 203-207 
algebraic formula, 264 
graph, 263 

graphs of curves with dilTerent 
means and same standard 
iloviations, 265 


Nivinal fri>|iiiM<\ <iir\t ilgibriu 
ind graphic repris-entalion of, 
graplis of curves with same 
mean and differcnl stand ird 
deviations, 2l>6 
arew undex (table), 693 
fitted to lustogrim of given dit i 
(graph!, 295 

metltodof fitting to sample liisln- 
grain, 269-300 
onlinatcs of (t thlc), 094 
real life conditions prndin ing, 
299-297 

recurrence in statistical annl>'siN, 
264 

standaril normal curve (graph), 
207 

and sjmmctncal binomial distri 
button, 270-300 
use in theory of sampling, 264 
useful approximation to binomial 
distribution where is largo, 
303 

Normal frequency surface, 469-196 
(See also Ihvanato frequenej 
surface Multivariate fre- 
quenej surface) 

Normal probability curve, {see Nor- 
mal frequeuev eurv o) 

Normality, dctemunalinn of, m bi- 
nomial distribution, 297-30Ci 
bj ixrmpanvm of Rjioiialslatis- 
iMs, 30V300 

III frcqiiiiuy distributions, 297- 
306 

hj ginphiccompiinson, 298-300 
fitting normal curve to sim- 
ple histogram, method of, 
299-390 

by test for goodness of fit (see 
( ondiuss of ht, lest of) 
in limo-sencs, indexea, 505-509 
of population, in sampling 316 
317 

0 

Order, of correlation coelTicicnta, 422 
of lorrelation slatislirs, 422 



SUBJECT IXDEX 


713 


Order, dpsignalion in correliition sta- 
tistics, indicates combination of 
variables, 422 
of regression statistics, 422 
of standard deviations, 422 
of variance, 363-364, 41'1^416 
Orthogonal-polynomial trends, 599- 
616 

calculation of coefficients A, B, 
C, , 606-607 
bj' subtotal Summation type of 
work sheet, 608-612 
orthogonal polynomials, defini- 
tion, 600-601 

forms used in fitting trends, 
603-606 

fables to save calculation, 612-615 
values of specified rmriables, 
dependent on number of 
years (tables), 613-615 
trend line by method of least 
squares, 601-603 
uses in trend analysis, 599-000 

P 

Parabolic regression, 383-388 
Parameter, definition of, 167 
Partial correlation, coefficient of, 

' 418-422 

calculation, 420-422 
definition, 400-401 
notation used in, 401-404 
obtained between two variables 
by holding third variable con- 
stant, 419-420 
Pascal, B., 242 

Pearl-Reed population curve, 519 
Pearson, Karl, 66, 203-294, 323- 
325, 339 

Pearson-Galton apparatus for bi- 
nomial distribution, 293-294 
illustrated, 293 

Pearsonian coefficient of correlation, 
338-349 

arithinetic view of r, 339-347 
Pepin the Short, 24 
Percentage, population percentage, 
313-315 


Periodogram, 561-502 
Permutations, defined and illus- 
trated, 232-233 

Persons, Warren M., 66, 530, 623 
Petty. Sir ll'iHiam, 65-06 
Pictograms, 102-103 
Planck, Max, 19 
Playfair, William, 100-101 
Polling agencies, 6 
Polynomials, definition, 250 
first-degree, graph of. 257-258 
implicit, 255, 259-260 
second-degree, graph of, 258-259 
Population, curves, 548-549 
laws of growth, 549-550 
technical term in frequency dis- 
tribution, 166 
theories, early, 549-550 
Prescott, Raymond B., 554 
Presentation of statistics, 92-121 
. cartograms, 112-121 

(See aho Cartograms) 
charts, 100-121 

(See also Charts) 
tables, 92-100 

Probability, combinations, 233-236 
concepts of, 236 
classical, 242-243 
criticism of classical concept, 
243-247 

meaning of "equally likely,” 

244 

principle of indifference, 214- 

245 

principle of sufficient reason, 
245-246 

subjective character of, 216- 
247 

frequency concept, 217 

criticism of von Miso' 
theory, 247-250 
intuitive-axiomatic approach, 

• 250-251 

curve, formulas for, 263-267 

(See aho Frequency curve.s; 
Xormal frequency curve) 
definition, 237 
dependent, 271-272 



714 hLEMi'jNl'MtY l>rAlI!SlWS AWD M'VIAVAllONS 


Probability, empirically determined, 
240-241 

iiiclcpcnclent, 270-271 
independent vs dependent illus- 
trated m real life case, 273-271 
law of large numbers, 230-240 
permutations, 232-233 
of possible combinations of 10 
coins (table), 282 
randomness, 241-242 
and relative frequency of actual 
events, 239-242 

I’robabibty calculus, 26S-278 
addition theorem, 268-2t)9 
for dependent probabilities, 271- 
272 

txduiplcs of calculation, for dis- 
crete distributions, 274 276 
for continuous distribution, 
270-278 

indopcndcnco vs dependente il- 
lustrated 111 real life COM, 
273-274 

for mdepciident probabihlj, 270- 
271 

multiplication theorem, 2C9-274 
statement, 2G0 

Probability distributions, 232 2b7 
continuous, 253-254 
discrete, 253 

functional relationships in («<• 
Functions) 

identical nith certain types of 
frequency distribution, 254 
probability curve, 254 

(See also ^ormal frequency 
curie) 

Probability sets, calculation of “de- 
rived" or “second-order” sets, 

ims 

finite, multiplication thconn valid 
for, 272 

fundamental, 237-238 
infinite, 238-239 
multiplication theorem valid 
for, 272 

Problem of Eshmatwn, Ihe, 500 


Product deviation, mcasurcniLiit of, 
339-347 

lYoduct-inoment tocflicicnt of cor- 
relation, 339Jf 

IVoduct moment formula for r, use 
m nonnormal frequency distri- 
butions, 491-492 
Product term, definition, 485 

disappears where corrdatioii is 
absent, 485 

Public opinion, sampling of, 6 
Ihiblicatioiis, statistical (see Statisti- 
cal pubbcations) 

Q 

Quality control, 18, 248 
Quantum theory, 19-21 
Quartilcs, calculation of, 218-220 
definition, 171-172 
interpretation, 227-22S 
use >n measuring skcuncss, 189- 
191 

(Questionnaires, mailed, 48-49 

good uill letter used m support 
of (typical form), 60 
rules for constructing, 49, 51 
(See also Schedules) 

Qudtcict, A , 87, 499, 513, 549-550 

11 

llntio ihirts, 131-137 
adiatilagea and disadiantjgis of, 
135-137 

paper used for, 133-135 
relative growth shown on, 131, 
13G-137 

three scales of paper used for, 
134-135 

value for comparisons iinpossiblo 
on arithmetic paper, 136-13^ 
llntio scale {see Semilogarithmio 
paper) 

national trends, 547-558, 574-o81 
dying institution, illustrated, 475- 
577 

jjosMblc, trends in dying insti- 
tution, 574 



S(Ji{Ji-:(jT jxj)icx 


7)0 


liatioual treniLs, dying ni:3lil.iititin, 
illustrated, trend fitted by 
method of least squares 
(graph), 577 

work sheet for annual index 
of normal and trend 
(table), 576 

growing institution, illustrated, 
578-581 

curve fitted by method of 
selected points (graph), 581 
' method of selected points, 
578-581 

work sheet for index of nor- 
mal and trend (table), 580 
Reciprocal regression, 381-383 
Reciprocals of numbers (table), 691- 
692 

Regression, linear plane of, 397-399, 
410-413 

(See also Linear plane of 
regression) 
logarithmic, 377-380 
multiple linear, 410-416 
parabolic, 383-388 
reciprocal, 391-383 
statistics, order in, 422 
Relative frequency, probability, 280 
Relativity theory, 21 
Research associations, 60-68 
Reoieio of Economic Stalislics, 66, 
500, 530 

S 

Sampling, 42-48 

by Bureau of Labor Statistics, 
42-48 

fitting of normal curve to sample 
hLstogram, 299-300 
in government study of family 
income and expenditures, 48 
of means, 315-319 

population mean, confidence 
limits for, 318 
estimate of, 318 
testing a hypothesis about, 
317-318 

sampling distribution, 315-316 


Sampling, le.-,tii)g a hypothesis, 317- 
318 

in 1940 U.S. Census, 42 
of percentages, 307-315 
coefficient of riak, 310-311 
confidence coefficient, 311-312 
confidence inten-ai, 313 
population percentage, deter- 
mining confidence limits 
for, 311-313 

likelihood of, det'med, 315 
likelihood of, relation to 
probability of sample (dia- 
gram), 314 

maximum likelihood, estimate 
of. 313-315 

testing hypothesL, about, 
309-311 

sampling distribution, 307 309 
statistical inferences from, 309- 

315 

typas of inference, 309 
typical problem, 307 
r.andom, 241-242, 307 
relative frequency of samples 
follows binomial distribution 
pattern, 308 

sampling distributions, (sic Sam- 
pling dislrilnitionsi 
standard errors for selected statis- 
tics, where distribution ap- 
proximates normal curve 
(t.able), 320 

stratified, in construction of index 
numbers, 515-516 
itse of normal frequency curv'c in, 
307-320 

conclusions as to, 319-320 
used ill business, 9 
of variances, population variance, 

316 

confidence limits for, 319 
optimum e„stimato of. 319 
testing a liy|K)tlie.',!.s about, 
318-319 

sampling distribution, 315-316 
standard deviation, the, 31/ 



Ki tU Ml \ tAU\ SI HtsiHS l\/> MM'IjK 


S.imi)linK <listnliutions, 307-31V 
( spliiiiPfl, 307 308 
of sample means, 315-316 
of sample percentages, 307-303 
of sample variance, 316-317 
Schwlules, coding of, o3-o5 
editing of, 52-53 

mailed questionnaires (we Quea- 
tionnaires, maUed) 
problems of enumeration, 23-42 
gueslionnaircs (are also Question 
nairca) 

tabulation of, 55 

(See also Tables Tabulation) 
units of description and measure- 
ment, 23—34 

illustrations of government care 
in, 35-48 

fieasonal vanation, 817 636 

cMises of, 618-621 

historical background of stud), 
617-618 

m Ithor, McCabe, 026 
measurement dlustraled, 625-633 
calculation by 12 months’ mo\ 
mg average method, 625 
630-633 

completed index (table) C3l 
multiple frequom v array, deter- 
minations from 633 
illustrated (griphs), 631 632 
^lork sheet for calciiiuting index 
(tibles), 626-630 

method of detecting change m, 
633-636 

computation of index for single 
jcar (table), 636 
index required for each year 
because of observable trend 
636 

trends in seasonal variation 
illustrated (graplis), 634 635 
methods of measuring, 621-625 
Kemmerer, 623 
Persons, 623 

problem of isolating, 621-62> 
link rcKlivo mcthol, 623 


Seiftonal lariitioii juoblein of iso- 
laliiig, ratio ddlrreiut from 
trend method, 621 
twelve months’ moving a\ ersgc 
method, 624-625 
various suggested methodfl, 
bibliography for, 624ii 
testing whether well define 1, 632 
trend in, 634-635 

&ccon<!-order, indicates statistic with 
two figures to right of decimal 
lo aubenpt, 422 

Scinilogarithmic paper, 133-13 j 
S enes, bivanate, 149-154 
fiequency, 139-149 
Sheppard’s correction, 299 
Shew hart. W A, 248 
Significant figures, meaning of, 236- 
231 

Simple correlation, 851-353 
lines of regression, calculated, 
362-363 

Simple functions, graphs of, 257-267 
tirclc 259-200 
ellipse, 260 

exponential funeUoii, declining, 
262-203 
rising 261-262 

first-degrco polynomials, 257- 
258 

normal frequency eiirve, 203-207 
second-degree pohnomnls, 2')S- 
2o9 

Simpson, C G , 14, 242 
Single best estimate, of population 
percentage in sampling, 314-315 
SkcwncSs, definition and significance 
of, 185-193 

measurement of, bj beta coc/Ti- 
cieiits, 192-193 

/>j medians aneJ qunrtifcs, ISl4- 
191 

by relation of mean, median, 
and mode, 185-189 
by third moment, 191-192 
Smith, Adam, 11-12 
Smithsoman Institution, 13 
Social Soicnce Research ronneil, 48 



SUBJECT IXIJEX 


717 


Social Security Administration, 537 
Sources of statistical data, 56-91 
(See also Guides to sources of 
statistics) 

agricultural, (see Agricultural sta- 
tistics) 

banking, (see Banking statistics, 
sources of) 
commercial, 68-71 
commercial and financial publi- 
cations, 70-71 
trade associations, 69-70 
trade journals, 68-69 
federal, 71-84 

Congressional investigations, 82 
financial, 70 

commcrdal and financial publi- 
cations, 70-71 

general summary, developing pat- 
tern of sources, 59-62 
for social sciences, 57-58 
guides to (see Guides to sources) 
industrial, (see Industrial statis- 
tics, sources of) 
international, 91 

on labor, (see Labor statistics, 
sources of) 

pattern of existing (outline), 61-62 
primarj’' vs. secondarj-, 56 
private research, individuals, 61, 
65-66 

handbooks on, 63 
research associations, 66-08 
(See also Statistical publica- 
tions) 

state and municipal, 84-85 
on trade, (see Trade) 
on transportation and com- 
munication, (see Transporta- 
tion and communication 
statistics) 

world statistics, 85-91 
- best sources of, 91 
Split-bar charts, 110-111 
Square roots of numbers, 100-1000 
table, 089-690 
Squares, 100-990 
of numbers (table), 6S5-0SG 


Standard deviation, first-order, 33.8 
from lines of regression, 330-338 
zero-order, 338 

Standard error, of variance, 310 
of estimate, 338 
definition, 390 

Standard Industrial Clssification 
Code, 54 

Statesman’s Yearbook, 86 
Statistic, definition of a, 107 
Statistical Abstract of the United 
States, 64 

Statistical Atlas, 73 
Statistical data (see Data) 
gathering of (see Data, gathering 
of) 

Stati.stieal laws, 19 
Statistical publication'^, ab.stiacling 
agencies, o7;i 
world statLsties, S.5-9I 

(Sec also Guides to soiuces'' 
Statistics, accuracy in calculating, 
230-231 

in the arts and sciences, 1-23 
in astronomy, 12-13 
in biology, 14-16 
in business administration, 7-9 
definition and meaning of, 1-1 
descriptive, 232 
in economic theory, 11-12 
in education, 9-11 
in engineering, 16-18 
forecasting by means of, 051-680 
gathering of, 24-55 

historical development in, 2 1-28 
(Sec also Data, gathering of) 
in governmental admuiistration, 
6-7 

in medicine, 16 
and philosophy, 21-22 
in physics and chemistry. SI -2 1 
in politics, 6 

presentation of (see Frcsentaticn 
of statistics) 
in sociology, 11 

souii'cs of (see Sources of otati-'ti- 
cal data) 



718 LLLMLSIARI iylAllhllCb IMJ ^liTL/C 17/OVA 


Statistics suiniiianzation and com- 
parison b) means of frc 
quency distributions, 158- 
198 

by means of index numbers, 
497 542 

mcasutements for, 160-182 
symbols used in (see Symbols used 
in statistics] 

thiurctieal, dc6mtion, 232 

by use of index numbers, 197jr 
m zoology, 13-14 

Summarization and comparison, ui 
bivariate frequency dislnbu 
tions 327 353 
mcasurcincnta of, 1G6-182 

Survej of Current Bunncst, 77, 512 
525, 531, 534 

Symbols used m statistics, 122 129 
b isic svmbols, 122-124 
multiple and partial correlation 
401 404 

time senes 124-120 
passage of tunc 124-125 
units iniohed, 126-127 
where variable fluctuates with 
time, 129-126 

Sjmnictncal bmonual distribution 
character of 283-285 
mean 283-284 
moments 285 
symmetry 283 
variautc, 284-285 
Jcnvation, 289-283 
graph of 284 

and the normal curve, 279-306 
beta values approocli those of 
normal curve 289 
distribution approaches normal 
curve as limit 285-290 
trophic comparison 288 
relative slope of frequency poly- 
gon and nurinal curve com- 
p ired, 289 

real life conditions pioducing 
290-297 

siunmarj, 29o-297 


Symmetrical binomial distribution, 
relativ c slope of frequency poly- 
ton computed for a given point 
(graph), 288 

for two values of N (graph), 280 
effect of scale adjustments 
(graph), 287 

seen m relative frequencies, 
282, 291-293 

Symmetry, in frequency surfaces, 
486-487 

in notation for multiple and p vr- 
tiul correlation, 404 
wntmg of equations by, illus- 
trated, 431-432 

a 

fables, 92 100 

construction of, 92 93, 95-90 
tenoral purpose 03-94 
special purpose, 93, 95-97, 100 
ty pea of, illustrated, 94-09 
labulalion, machine, 55, 72-73 
mechanics of, 73 
pnneipks of, 92 
(See aho Tables) 

Test of goodness of flt, 3(X>-30o 
(,See dUo Chi square (x*) test of 
goodness of flt) 

Theorem Founers o61 
Theory of errors, 294 296 ' 

not intended to mask inaccuracy 
of calculation, 230 
Iheory of relativity (see Relativity 
theory) 

Third-order statujUcs, 422 
Tune senes, analysis of {see Cycle 
determination Seasonal vana- 
tion Trend analysis) 
careful description of units in- 
volved, 126 

tonv enlional charting of, 128-129 
cumulative vs noncumulativc 
data, 127-128 

Uuuciits of V in ition m, 543-547 
cycle, 544-547 

long term giowWv or trend, 513- 
>17 



SUUJJiCr INDEX 


719 


Tiiiii; srrii-t^, clciiicutx of viiiiiilioti 
in, niKuluiil (liii-timlions, .')7-l 
bonsoiial varialions, .543-r)17 
liypothctical, showing elements of 
variation (table), 544 
rational basis of analysis of, 543- 
5()3 

rational trends, 547-558, 504 
'riinc-serics analysis, development of 
technique for, 560-503 
harmonic (periodograin) analy- 
sis, 501-502 

major cycle determination by 
Kuimcts’ methods, 501 
ordinary and minor cycle deter- 
mination by empirical 
methods, 501-502 
use of functions of arc tangent, 
502 

use of orthogonal polynomials, 
502, 599-010 
empirical trends, 558-500 
application to cycle analysis, 
558-500 

empirical vs. rational trends, 504 
rational basis for, 543-503 
rational trends, application to 
social philosophy, 553-558 
ba.sis for rationalizing, 550-552 
criticism of, 552-553 
early population thcoric.s, 549- 
550 

historical background, 547-548 
])opulation curves, 548-549 
(»S'ce aluo Cycle determination; 
Sca-sonal variation); Trend 
analysis 

Trade, Department of Commerce, 80 
Commerce Yearbook, 80 
domestic, 81 

Federal Trade Commission, 81 
foreign, 77, 82, 80 
Bureau of Foreign and Domes- 
tic Commerce, 77 
Slalistkal Abstract of the 
United States, 77 
Survey of Current Business, 77 
Stalcsyiian’s Yearbook, 80 


Iradc, I , 8 . 'I’arilT Commissiim, 82 
I ransporlat ion /md coiiimunicalion 
statistics, 81 

Interstate Commerce Commis- 
sion, 81 

OTeasury Department, 7 
Treatise on hloncy, J. :M. Keynes 
517 

'I’lcnd analysis, 504-010 
detecting cycle by removing cm- 
IJirical trend, 505 
empirical vs. rational trend, 504 
empirical trends, illustrated. 582- 
598 

analysis of cycles by empirical 
trends, 5D4-598 

finite differencc-s method for 
finding trend values, 589-594 
polynomials, 583-594 
straight-line trtmd, 582-583 
methods of fitting trend, 505-574 
by averages, 573 
moving averages method, 
573-574 

by least squares, 505-573 
advantages of method, 574 
biwie method, 505-568 
Jiiimeru'al illustration, 508 
509 

probability theory not ap- 
plied, 570-571 

•second- or third-degree 
curves, 509-570 
by selected points, 571-573 
orthogonal-polynomial trends (see 
Orthogonal-ploynomial 
trends) 

rational trend-s, illustrated, 574- 
581 

dying institution, 574-577 
growing institution, 578-581 
Trends, empirical (see Empirical 
trends) 

orthogonal-polynomial (see Orth- 
ogonal-polynomial trends) 
rational (see Rational trends) 



720 1 U'Ml'NrAltY \TAni^llCi, A\^D UrA/(li/0\s' 


invniii\c firiiuti«\ il»si4il)niioii 
405-410 

Twentieth Centurj I unti, The, 6h 
U 

I S Biirenu of Census, 42 48, 53- 
50, 532 

U S Consus, 30-38, 532 
development of, 71-76 
US Conans of ARMCullu/e, 38-42 
I S Depirtmnit of 4K«'«‘'dtuK‘, 12, 
524 {««« aUo \gncMlluml Sli- 
tuitics) 

L S Department of Commerce, 56, 
504, 512, 524 531 
Bureau of Census («re US 
Bureau of Census) 

Bureau of Tarcign and Domestic 
Coinnierie, 56, 76-77, 512, 
535 

Swvfij of Current /limncs* >25, 
531, oZi 

U S Department of Lalmr 525 
V a Government ifaniiol, 64 
I nits, enreful description nceewra 
in time series J26-127 
of cntimoriiion I 
of measurement 1-2 
rnivanalc frequency dislnbutum, 
325 

V 

Van llurcn, President Marlin, 72 
Variable attributes, 157 
Variable, continuous, illiislraicd, 
292-293 

discrete, illustrated, 291-294 
but not integral, illustrated, 292 
integral, illustrated, 291-292 
noninlcgral, illustrated, 292 
Variable Y, in an arra\, illustrated 
use of, 139-140 
Variability, 182-185 

aieragc deviation, 183-184 
lange, 182-183 


5miibilitj, St md ml dcM itioii, ISl 
185 

uiiiacrsal condition fating seien* 
list, 122 ’ 

Vanaiiec, calculation of, 215 
definition of, 185 
first-order, calculated, 383 361 
defined, 33S 
meaning of, 336-338 
relation to r, 351-352 
proportion measured, b> square of 
correlation coeffluent, 353 
b> square of correlation iiulr\, 
896 

by square of correlation riiui, 
373-376 

sampling distribution of, 310 
second order, for linear plane of 
regression, 413-41(i 
thud order, 434 ‘ 

Variation, frequency senes, 138-143 
static frequency distribution as 
tool for analysis, 158 
static vs djmmic, 154-155 
Venn J . 247 

Vcriiuht growth tune, 5)6-551 
\V 

Walker, Irancis V , 72 
Ward, 1 ester Frink, 11 
Wiilum the ( onqueror, 25 
Works Progress Administration, 53 
538 

W orU dbiioiMip, 64 
II arid Economic Suney, 85 
World Peace Foundation, 91 
Wright, Carroll D , 78 


■5 iiV, t. 1 (ln>, 321, .549-551 
Z 

Zen) order stalislics m correhtimi, 
444 



